menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Navigate t...
source image

Arxiv

1w

read

144

img
dot

Image Credit: Arxiv

Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration

  • Reinforcement learning has been used to enhance the reasoning capabilities of Large Language Models (LLMs), but current approaches face limitations in guiding exploration and providing effective feedback.
  • A new method called Intrinsic Motivation guidEd exploratioN meThOd foR LLM Reasoning (i-MENTOR) is proposed to address these challenges by delivering dense rewards and amplifying explorations in the RL-based training paradigm.
  • i-MENTOR introduces trajectory-aware exploration rewards, dynamic reward scaling, and advantage-preserving reward implementation to improve performance in complex reasoning tasks.
  • Experiments show that i-MENTOR achieves a 22.39% improvement on the difficult dataset Countdown-4, demonstrating its effectiveness in enhancing LLM reasoning.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app