Action-dependent individual policies aim to achieve global optimality in multi-agent reinforcement learning.
Existing literature often uses auto-regressive action-dependent policies, leading to scalability issues as the number of agents increases.
A more generalized class of action-dependent policies that do not follow the auto-regressive form is proposed, utilizing the 'action dependency graph (ADG)' to model inter-agent action dependencies.
Through theoretical analysis and empirical experiments, the approach demonstrates potential for addressing broader challenges in multi-agent reinforcement learning.