<ul><li>In the zero-shot policy transfer setting in reinforcement learning, training an agent on a fixed set of environments allows it to generalize to unseen environments.</li><li>Policy distillation after training can enhance performance in testing environments, with the theory suggesting training an ensemble of distilled policies and using diverse training data for distillation.</li><li>A generalization bound for policy distillation after training has been proven in this paper, offering insights for improved generalization in reinforcement learning.</li><li>Empirical verification shows that utilizing an ensemble of policies distilled on a diverse dataset can lead to significantly better generalization compared to the original agent.</li></ul>

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Discover more