menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Policy Opt...
source image

Arxiv

3d

read

361

img
dot

Image Credit: Arxiv

Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games

  • Researchers study a long-run mean-variance team stochastic game (MV-TSG) and propose a Mean-Variance Multi-Agent Policy Iteration (MV-MAPI) algorithm.
  • The MV-TSG faces challenges with the non-additive and non-Markovian variance metric, as well as non-stationary environment due to simultaneous policy updates.
  • The MV-MAPI algorithm converges to a first-order stationary point, with specific conditions for local Nash equilibria and local optima.
  • To solve large-scale MV-TSGs with unknown environmental parameters, a multi-agent reinforcement learning algorithm named MV-MATRPO is proposed.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app