Behavioral cloning (BC) with supervised learning is used to learn policies from human demonstrations in domains like robotics.
Goal-conditioning BC policies enables capturing diverse behaviors within an offline dataset.
While goal-conditioned behavior cloning methods perform well on in-distribution tasks, they may not generalize zero-shot to tasks requiring conditioning on novel state-goal pairs (combinatorial generalization).
Temporal consistency in state representations learned by BC plays a role in enabling combinatorial generalization.
Encouraging temporal consistency reduces the out-of-distribution gap for novel state-goal pairs.
Successor representations that encode the distribution of future states from the current state help in achieving temporal consistency.
Prior methods for learning successor representations have used contrastive samples, temporal-difference learning, or both.
A new approach, BYOL-γ augmented GCBC, is proposed in this work for representation learning without contrastive samples or TD learning.
BYOL-γ augmented GCBC can theoretically approximate the successor representation in the finite MDP case.
Empirical results show competitive performance across various challenging tasks requiring combinatorial generalization with BYOL-γ augmented GCBC.