menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Policy Gra...
source image

Arxiv

2d

read

66

img
dot

Image Credit: Arxiv

Policy Gradient with Second Order Momentum

  • Policy Gradient with Second-Order Momentum (PG-SOM) is a lightweight second-order optimization scheme for reinforcement-learning policies.
  • PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: gradient average and a diagonal approximation of the Hessian.
  • The method adaptively rescales each parameter by preconditioning the gradient with the curvature estimate, leading to faster and more stable ascent of the expected return.
  • Numerical experiments on standard control benchmarks demonstrate increased sample efficiency and reduced variance compared to first-order and Fisher-matrix baselines, indicating the practical gains of using even coarse second-order information.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app