Proximal Policy Optimization (PPO) faces challenges in Federated Reinforcement Learning (FRL) due to the update order of its actor and critic.
The conventional update order in PPO may cause heterogeneous gradient directions among clients, hindering convergence to a globally optimal policy in FRL.
FedRAC proposes reversing the update order in PPO (actor first, then critic) to eliminate the divergence of critics from different clients.
Theoretical analysis and empirical results support that FedRAC achieves higher cumulative rewards and faster convergence compared to the conventional PPO update order in FRL scenarios.