Reinforcement Learning (RL) frameworks often face challenges in terms of iteration efficiency and robustness.
Risk-sensitive policy gradient methods aim to yield more robust policies, but their iteration complexity is not well understood.
A rigorous analysis of the risk-sensitive policy gradient method reveals an iteration complexity of O(ε^-2) to reach an ε-approximate first-order stationary point.
Empirical evaluation shows that risk-averse cases can converge and stabilize faster compared to risk-neutral counterparts.