menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Offline RL...
source image

Arxiv

5d

read

122

img
dot

Image Credit: Arxiv

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

  • Offline RL struggles with distributional shifts, leading to $Q$-value overestimation for out-of-distribution actions.
  • Existing methods impose constraints which can be too conservative for evaluating out-of-distribution regions, hindering $Q$-function generalization and policy improvement.
  • A novel approach called Smooth Q-function OOD Generalization (SQOG) enhances $Q$-value estimation by smoothing out-of-distribution $Q$-values with neighboring in-sample $Q$-values within Convex Hull and its Neighborhood (CHN).
  • The proposed Smooth Bellman Operator (SBO) theoretically approximates true $Q$-values for both in-sample and out-of-distribution actions within CHN, and the practical SQOG algorithm outperforms existing state-of-the-art methods in performance and computational efficiency on D4RL benchmarks.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app