Reinforcement learning (RL) systems are challenged by complex real-world scenarios requiring coordination between multiple agents.
In a recent study, researchers show that employing an inference phase at execution time, along with a specific inference strategy, can help break the performance ceiling in multi-agent RL problems.
The research results indicate up to a 126% improvement over previous state-of-the-art performance across 17 tasks with a few seconds of additional wall-clock time during execution.
The study includes over 60k experiments and provides promising compute scaling properties, making it the most extensive research on inference strategies for complex RL to date.