Offline RL struggles with distributional shifts, leading to $Q$-value overestimation for out-of-distribution actions.
Existing methods impose constraints which can be too conservative for evaluating out-of-distribution regions, hindering $Q$-function generalization and policy improvement.
A novel approach called Smooth Q-function OOD Generalization (SQOG) enhances $Q$-value estimation by smoothing out-of-distribution $Q$-values with neighboring in-sample $Q$-values within Convex Hull and its Neighborhood (CHN).
The proposed Smooth Bellman Operator (SBO) theoretically approximates true $Q$-values for both in-sample and out-of-distribution actions within CHN, and the practical SQOG algorithm outperforms existing state-of-the-art methods in performance and computational efficiency on D4RL benchmarks.