Unsupervised reinforcement learning (URL) aims at learning general skills for unseen downstream tasks.
Mutual Information Skill Learning (MISL) maximizes the mutual information between states and skills but lacks thorough theoretical analysis on its effectiveness in initializing downstream task policies.
New theoretical analysis shows that the diversity and separability of learned skills are crucial for downstream task adaptation, aspects that MISL may not guarantee.
To complement MISL, a novel disentanglement metric LSEPIN is proposed.
An information-geometric connection between LSEPIN and downstream task adaptation cost is established.
A new strategy replacing KL divergence with Wasserstein distance is investigated for better geometrical properties, leading to the novel skill-learning objective WSEP.
WSEP is theoretically proven to be beneficial for downstream task adaptation and discovering more initial policies compared to MISL.
A Wasserstein distance-based algorithm PWSEP is proposed, capable of theoretically discovering all optimal initial policies.