<ul><li>The paper introduces the concept of learning to search (L2S) from expert demonstrations to address the limitations of behavioral cloning (BC) in imitation learning.</li><li>L2S involves learning a world model and a reward model to enable agents to plan to match expert outcomes, even after making mistakes.</li><li>The approach named SAILOR consistently out-performs state-of-the-art Diffusion Policies trained via BC on various visual manipulation tasks from different benchmarks.</li><li>SAILOR demonstrates the ability to identify nuanced failures, is robust to reward hacking, and shows improved performance compared to using 5-10 times more demonstrations in BC.</li></ul>

A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search

Discover more