The article discusses the challenges faced in optimizing Electric Bus (EB) charging schedules due to uncertainties in travel time, energy consumption, and fluctuating electricity prices.
A solution proposed in the paper is the use of Hierarchical Deep Reinforcement Learning (HDRL) to reformulate the Markov Decision Process (MDP) into two augmented MDPs for efficient decision-making across multiple time scales.
The novel HDRL algorithm introduced, called Double Actor-Critic Multi-Agent Proximal Policy Optimization Enhancement (DAC-MAPPO-E), addresses scalability challenges for large EB fleets by redesigning the decentralized actor network and incorporating the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm.
Extensive experiments with real-world data support the superior performance and scalability of DAC-MAPPO-E in optimizing EB fleet charging schedules.