<ul><li>Stationary Distribution Correction Estimation (DICE) helps address the mismatch between stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation and policy optimization.</li><li>Recent approaches to enhance offline reinforcement learning performance inadvertently hinder DICE's ability for off-policy evaluation, especially in constrained reinforcement learning scenarios.</li><li>The limitation in recent approaches is attributed to their dependence on semi-gradient optimization, leading to failures in cost estimation in the DICE framework.</li><li>A novel method called semi-gradient DICE is proposed to overcome limitations and improve off-policy evaluation and performance in offline constrained reinforcement learning, achieving state-of-the-art results on the DSRL benchmark.</li></ul>

Semi-gradient DICE for Offline Constrained Reinforcement Learning

Discover more