Federated reinforcement learning (FRL) allows multiple agents to collaborate and learn a shared policy in different environments.
The actor-critic (AC) algorithm is known for its low variance and high sample efficiency in RL.
However, theoretical understanding of AC in a federated manner with different environments is limited.
The Single-loop Federated Actor Critic (SFAC) algorithm is proposed, showing convergence to a near-stationary point and linear speed-up in sample complexity.