Offline goal-conditioned reinforcement learning is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories.
Hierarchical RL methods achieve good results on long-horizon goal-reaching tasks but face challenges in scaling due to their reliance on modular policies and subgoal generation.
A new algorithm has been introduced to train a flat goal-conditioned policy by bootstrapping on subgoal-conditioned policies, eliminating the need for a generative model over the goal space.
This approach outperforms existing offline GCRL algorithms on various locomotion and manipulation benchmarks, scaling well to complex, long-horizon tasks.