Constrained Reinforcement Learning (CRL) focuses on sequential decision-making with goal achievement while meeting constraints.
Policy-based methods, especially in continuous-control problems, feature action-based or parameter-based exploration strategies.
A new exploration-agnostic algorithm, C-PG, is introduced with global convergence guarantees under gradient domination assumptions.
C-PG demonstrates effectiveness in learning deterministic policies for constrained control tasks based on empirical validation and comparisons with baselines.