Researchers study a long-run mean-variance team stochastic game (MV-TSG) and propose a Mean-Variance Multi-Agent Policy Iteration (MV-MAPI) algorithm.
The MV-TSG faces challenges with the non-additive and non-Markovian variance metric, as well as non-stationary environment due to simultaneous policy updates.
The MV-MAPI algorithm converges to a first-order stationary point, with specific conditions for local Nash equilibria and local optima.
To solve large-scale MV-TSGs with unknown environmental parameters, a multi-agent reinforcement learning algorithm named MV-MATRPO is proposed.