Beneficial actions taken by others, even when hidden, pose a challenge for multi-agent reinforcement learning (MARL).
A study was conducted on the impact of hidden gifts in a simple MARL task where agents in a grid-world environment need to unlock individual doors for rewards and drop a key to obtain a larger collective reward.
State-of-the-art RL algorithms, including MARL algorithms, struggled to learn how to achieve the collective reward in the task.
Independent model-free policy gradient agents could solve the task with information about their own action history, while MARL agents failed to do so. A correction term inspired by learning aware approaches helped independent agents converge to collective success more reliably.