Offline-to-online deployment of reinforcement-learning (RL) agents need to address the sim-to-real and interaction gaps.
A new framework called DT-CORL (Delay-Transformer belief policy Constrained Offline RL) is introduced to handle delayed dynamics during deployment.
DT-CORL produces delay-robust actions using a transformer-based belief predictor and is more sample-efficient compared to history-augmentation baselines.
Experiments on D4RL benchmarks demonstrate that DT-CORL outperforms other methods, bridging the sim-to-real latency gap while maintaining data efficiency.