Return-Conditioned Supervised Learning (RCSL) simplifies policy learning in sequential decision-making problems by framing it as a supervised learning task with state and return inputs, enhancing stability compared to traditional offline RL algorithms.
Reinforced RCSL is introduced to address RCSL's limitation of being performance-constrained by the dataset's policy quality, by incorporating in-distribution optimal return-to-go concept to determine the best achievable future return based on the current state.
The theoretical analysis shows that Reinforced RCSL consistently outperforms standard RCSL, offering a more effective approach with simplified return augmentation techniques.
Empirical results support the superiority of Reinforced RCSL over RCSL, demonstrating improved performance across various benchmarks in modern decision-making tasks.