<ul><li>Return-Conditioned Supervised Learning (RCSL) simplifies policy learning in sequential decision-making problems by framing it as a supervised learning task with state and return inputs, enhancing stability compared to traditional offline RL algorithms.</li><li>Reinforced RCSL is introduced to address RCSL's limitation of being performance-constrained by the dataset's policy quality, by incorporating in-distribution optimal return-to-go concept to determine the best achievable future return based on the current state.</li><li>The theoretical analysis shows that Reinforced RCSL consistently outperforms standard RCSL, offering a more effective approach with simplified return augmentation techniques.</li><li>Empirical results support the superiority of Reinforced RCSL over RCSL, demonstrating improved performance across various benchmarks in modern decision-making tasks.</li></ul>

How to Provably Improve Return Conditioned Supervised Learning?

Discover more