This work focuses on data-efficient exploration in reinforcement learning by examining information-theoretic approaches to intrinsic motivation.
Exploration bonuses targeting epistemic uncertainty are studied, showing that they signal information gains and converge to zero as the agent learns about the environment.
The analysis provides formal guarantees for these approaches and discusses practical approximations through different models like sparse variational Gaussian Processes and Deep Ensemble models.
The framework called Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) is introduced, combining model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration, outperforming other baselines in various environments in the empirical evaluation.