Supervised learning (SL) and temporal difference (TD) learning methods are being combined to enhance reinforcement learning (RL) capabilities.
A new approach called Goal-Conditioned Reinforced Supervised Learning (GCReinSL) is introduced to improve SL methods with trajectory stitching capability.
This approach incorporates $Q$-conditioned policy and $Q$-conditioned maximization to bridge the performance gap between SL and TD learning in offline goal-conditioned RL.
Experimental results show that GCReinSL outperforms previous SL methods with trajectory stitching capabilities and goal data augmentation techniques.