Imitation Learning from Observation (IfO) enables large-scale behavior learning by using action-free demonstrations.
Current IfO research typically focuses on idealized scenarios with limited data distributions.
This paper introduces a method to learn from more nuanced data distributions, aiming for iterative self-improvement in imitation learning.
The study adapts RL-based imitation learning to action-free demonstrations with a value function and highlights the importance of more practical IfO techniques for scalable behavior learning.