The Potentially Optimal Joint Actions Weighted QMIX (POWQMIX) algorithm is proposed as an improvement to value function factorization methods in cooperative multi-agent reinforcement learning.
POWQMIX recognizes potentially optimal joint actions and assigns higher weights to corresponding losses during training, increasing the representation capacity of value factorization compared to existing methods.
The algorithm guarantees to recover the optimal policy through its weighted training approach.
Experiments in various environments demonstrate that POWQMIX outperforms state-of-the-art value-based multi-agent reinforcement learning methods.