<ul><li>Diffusion and flow models are powerful generative approaches for modeling diverse behavior but are challenging for offline RL due to noise sampling processes.</li><li>A new offline RL algorithm, SORL, is introduced in this paper, leveraging shortcut models to scale training and inference efficiently.</li><li>SORL's policy can capture complex data distributions and is trained in a one-stage procedure, demonstrating strong performance across offline RL tasks.</li><li>At test time, SORL scales inference using the learned Q-function as a verifier and shows positive scaling behavior with increased test-time compute.</li></ul>

Scaling Offline RL via Efficient and Expressive Shortcut Models

Discover more