<ul><li>Offline RL struggles with distributional shifts, leading to $Q$-value overestimation for out-of-distribution actions.</li><li>Existing methods impose constraints which can be too conservative for evaluating out-of-distribution regions, hindering $Q$-function generalization and policy improvement.</li><li>A novel approach called Smooth Q-function OOD Generalization (SQOG) enhances $Q$-value estimation by smoothing out-of-distribution $Q$-values with neighboring in-sample $Q$-values within Convex Hull and its Neighborhood (CHN).</li><li>The proposed Smooth Bellman Operator (SBO) theoretically approximates true $Q$-values for both in-sample and out-of-distribution actions within CHN, and the practical SQOG algorithm outperforms existing state-of-the-art methods in performance and computational efficiency on D4RL benchmarks.</li></ul>

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Discover more