This work presents a two-tier optimization framework for data center resource management in large-scale heterogeneous environments.
The framework combines deep reinforcement learning (DRL) with a gradient-based heuristic for optimal rack positioning.
The high-level DRL agent determines optimal rack type ordering, while the low-level heuristic minimizes movement counts and ensures fault-tolerant resource distribution.
The proposed approach outperformed the gradient-based heuristic and mixed-integer programming (MIP) solver in terms of objective value and computational efficiency.