menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

The Unreas...
source image

Arxiv

1w

read

372

img
dot

Image Credit: Arxiv

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

  • Entropy minimization (EM) without labeled data can significantly enhance the performance of large language models (LLMs) on math, physics, and coding tasks.
  • Three approaches were explored: EM-FT minimizes token-level entropy similarly to instruction fine-tuning; EM-RL uses reinforcement learning with negative entropy as the only reward; EM-INF adjusts logit at inference time to reduce entropy without data or parameter updates.
  • EM-RL achieved comparable performance to strong RL baselines like GRPO and RLOO on Qwen-7B without labeled data, while EM-INF enabled Qwen-32B to exceed the performance of models like GPT-4o and Gemini 1.5 Pro on SciCode benchmark.
  • Pretrained LLMs exhibit enhanced reasoning capabilities through entropy minimization alone, showcasing the potential for improved model performance without labeled data or parameter updates.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app