Researchers have developed a method called ECON to improve reasoning in large language models by recasting multi-LLM coordination as an incomplete-information game and seeking a Bayesian Nash equilibrium.
ECON is a hierarchical reinforcement-learning paradigm that allows each LLM to independently select responses based on its beliefs about co-agents, without the need for costly inter-agent exchanges.
Mathematical proofs show that ECON achieves a significantly tighter regret bound compared to non-equilibrium multi-agent schemes, and empirical results demonstrate an average performance improvement of 11.2% across six different benchmarks.
Experiments also confirm ECON's scalability and ability to incorporate additional models, indicating potential for larger and more powerful multi-LLM ensembles.