Value function decomposition methods for cooperative multi-agent reinforcement learning aim to compose joint values from individual per-agent utilities to ensure consistent action selection.
Existing methods like VDN and QMIX have limited representation capabilities, while QPLEX, the exception, is overly complex.
A new family of value function decomposition models called QFIX is introduced in this work, expanding representation capabilities with a fixing layer.
Empirical evaluation on multiple environments shows that QFIX enhances performance, learns stably, outperforms QPLEX, and uses simpler mixing models.