Policy Gradient with Second-Order Momentum (PG-SOM) is a lightweight second-order optimization scheme for reinforcement-learning policies.
PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: gradient average and a diagonal approximation of the Hessian.
The method adaptively rescales each parameter by preconditioning the gradient with the curvature estimate, leading to faster and more stable ascent of the expected return.
Numerical experiments on standard control benchmarks demonstrate increased sample efficiency and reduced variance compared to first-order and Fisher-matrix baselines, indicating the practical gains of using even coarse second-order information.