Developing agents for complex and underspecified tasks, where no clear objective exists, remains challenging but offers many opportunities.
The article introduces Random Network Distillation DAgger (RND-DAgger) as an active imitation learning method that uses a learned state-based out-of-distribution measure to trigger interventions.
RND-DAgger reduces the need for constant expert input during training and outperforms traditional imitation learning and other active approaches in 3D video games and a robotic locomotion task.
The method effectively limits expert querying by intervening only when necessary, improving the efficiency of active imitation learning.