A new policy parameterization called magnitude and direction (MAD) policies is introduced for reinforcement learning (RL).
MAD policies preserve Lp closed-loop stability for nonlinear dynamical systems, introducing explicit feedback on state-dependent features without compromising stability.
The control input magnitude is described with a disturbance-feedback Lp-stable operator, while the direction is selected based on state-dependent features using a universal function approximator.
MAD policies maintain closed-loop stability in model-free RL pipelines without requiring model information beyond open-loop stability.