menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Improving ...
source image

Arxiv

3d

read

261

img
dot

Image Credit: Arxiv

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

  • Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs) to enhance reasoning capabilities.
  • This paper introduces two techniques, difficulty-targeted online data selection and rollout replay, to improve data efficiency in LLM RL fine-tuning.
  • The method proposes adaptive difficulty to prioritize questions of moderate difficulty for learning signals and uses an attention-based framework for estimating adaptive difficulty efficiently.
  • Experiments across 6 LLM-dataset combinations demonstrate that the proposed method reduces RL fine-tuning time by 25% to 65% while achieving the same performance level as the original GRPO algorithm.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app