menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Semi-gradi...
source image

Arxiv

4d

read

73

img
dot

Image Credit: Arxiv

Semi-gradient DICE for Offline Constrained Reinforcement Learning

  • Stationary Distribution Correction Estimation (DICE) helps address the mismatch between stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation and policy optimization.
  • Recent approaches to enhance offline reinforcement learning performance inadvertently hinder DICE's ability for off-policy evaluation, especially in constrained reinforcement learning scenarios.
  • The limitation in recent approaches is attributed to their dependence on semi-gradient optimization, leading to failures in cost estimation in the DICE framework.
  • A novel method called semi-gradient DICE is proposed to overcome limitations and improve off-policy evaluation and performance in offline constrained reinforcement learning, achieving state-of-the-art results on the DSRL benchmark.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app