Unsupervised zero-shot reinforcement learning (RL) is a powerful paradigm for pretraining behavioral foundation models (BFMs).
The BFMs solve downstream tasks specified via reward functions in a zero-shot fashion without additional test-time learning or planning.
This paper focuses on devising fast adaptation strategies to improve the zero-shot performance of BFMs in a few steps of online interaction with the environment.
The proposed strategies achieve 10-40% improvement over the zero-shot performance in a few tens of episodes, outperforming existing baselines.