<ul><li>Quality-Diversity (QD) algorithms have been successful in discovering varied and high-performing solutions but often rely on manually defined behavioral descriptors, limiting exploration to predetermined diversity concepts.</li><li>A new approach called AutoQD is introduced, which automatically generates behavioral descriptors by embedding policy occupancy measures in Markov Decision Processes, using random Fourier features to approximate Maximum Mean Discrepancy (MMD) between policy occupancy measures.</li><li>The embeddings created by AutoQD reflect meaningful differences in behavioral diversity, and a reduced-dimensional projection of these embeddings is utilized as behavioral descriptors for standard QD methods, showcasing the ability to uncover diverse policies without predefined descriptors.</li><li>Experiments conducted across various continuous control tasks validate the effectiveness of AutoQD in discovering diverse policies without the need for predefined behavioral descriptors, suggesting promising applications in unsupervised Reinforcement Learning and QD optimization.</li></ul>

AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

Discover more