<ul><li>Proximal Policy Optimization (PPO) faces challenges in Federated Reinforcement Learning (FRL) due to the update order of its actor and critic.</li><li>The conventional update order in PPO may cause heterogeneous gradient directions among clients, hindering convergence to a globally optimal policy in FRL.</li><li>FedRAC proposes reversing the update order in PPO (actor first, then critic) to eliminate the divergence of critics from different clients.</li><li>Theoretical analysis and empirical results support that FedRAC achieves higher cumulative rewards and faster convergence compared to the conventional PPO update order in FRL scenarios.</li></ul>

The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning

Discover more