<ul><li>Supervised learning (SL) and temporal difference (TD) learning methods are being combined to enhance reinforcement learning (RL) capabilities.</li><li>A new approach called Goal-Conditioned Reinforced Supervised Learning (GCReinSL) is introduced to improve SL methods with trajectory stitching capability.</li><li>This approach incorporates $Q$-conditioned policy and $Q$-conditioned maximization to bridge the performance gap between SL and TD learning in offline goal-conditioned RL.</li><li>Experimental results show that GCReinSL outperforms previous SL methods with trajectory stitching capabilities and goal data augmentation techniques.</li></ul>

Bridging Supervised and Temporal Difference Learning with $Q$-Conditioned Maximization

Discover more