<ul><li>Policy Gradient with Second-Order Momentum (PG-SOM) is a lightweight second-order optimization scheme for reinforcement-learning policies.</li><li>PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: gradient average and a diagonal approximation of the Hessian.</li><li>The method adaptively rescales each parameter by preconditioning the gradient with the curvature estimate, leading to faster and more stable ascent of the expected return.</li><li>Numerical experiments on standard control benchmarks demonstrate increased sample efficiency and reduced variance compared to first-order and Fisher-matrix baselines, indicating the practical gains of using even coarse second-order information.</li></ul>

Policy Gradient with Second Order Momentum

Discover more