<ul><li>Offline-to-online deployment of reinforcement-learning (RL) agents need to address the sim-to-real and interaction gaps.</li><li>A new framework called DT-CORL (Delay-Transformer belief policy Constrained Offline RL) is introduced to handle delayed dynamics during deployment.</li><li>DT-CORL produces delay-robust actions using a transformer-based belief predictor and is more sample-efficient compared to history-augmentation baselines.</li><li>Experiments on D4RL benchmarks demonstrate that DT-CORL outperforms other methods, bridging the sim-to-real latency gap while maintaining data efficiency.</li></ul>

Adapting Offline Reinforcement Learning with Online Delays

Discover more