<ul><li>Large language models (LLMs) are being used to mimic human behavior in sequential decision-making tasks.</li><li>A study compared the exploration-exploitation strategies of LLMs, humans, and multi-armed bandit (MAB) algorithms.</li><li>Reasoning enhances LLM decision-making, making them exhibit more human-like behavior with a mix of random and directed exploration.</li><li>LLMs perform similarly to humans in simple tasks but struggle to match human adaptability in complex, non-stationary environments.</li></ul>

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks

Discover more