menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

From Data-...
source image

Arxiv

3d

read

47

img
dot

Image Credit: Arxiv

From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization

  • Researchers introduce LPPO framework to enhance Large Language Models' reasoning capabilities through progressive optimization.
  • LPPO framework leverages a small set of high-quality demonstrations using prefix-guided sampling and learning-progress weighting.
  • Prefix-guided sampling augments data with partial solution prefixes from expert demonstrations to improve policy guidance.
  • Learning-progress weighting adjusts sample influence based on model progression, leading to faster convergence and improved performance on mathematical-reasoning benchmarks.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app