<ul><li>Reinforcement learning has been used to enhance the reasoning capabilities of Large Language Models (LLMs), but current approaches face limitations in guiding exploration and providing effective feedback.</li><li>A new method called Intrinsic Motivation guidEd exploratioN meThOd foR LLM Reasoning (i-MENTOR) is proposed to address these challenges by delivering dense rewards and amplifying explorations in the RL-based training paradigm.</li><li>i-MENTOR introduces trajectory-aware exploration rewards, dynamic reward scaling, and advantage-preserving reward implementation to improve performance in complex reasoning tasks.</li><li>Experiments show that i-MENTOR achieves a 22.39% improvement on the difficult dataset Countdown-4, demonstrating its effectiveness in enhancing LLM reasoning.</li></ul>

Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration

Discover more