Offline meta-reinforcement learning aims to equip agents with the ability to rapidly adapt to new tasks by training on data from a set of different tasks.
Context-based approaches suffer from distribution mismatch, limiting their ability to generalize to the test tasks.
A new approach is proposed to minimize the mutual information between task representations and behavior policy, improving generalization ability.
The approach outperforms prior methods in both in-distribution and out-of-distribution tasks in MuJoCo environments.