menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

e3: Learni...
source image

Arxiv

4d

read

215

img
dot

Image Credit: Arxiv

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

  • Test-time scaling aims to enhance Large Language Models (LLMs) reasoning by utilizing more compute at inference time and enabling extrapolation for improved performance on challenging problems.
  • Existing reasoning models generally do not extrapolate well, but one way to enable extrapolation is by training the LLM to engage in in-context exploration.
  • In-context exploration involves training the LLM to appropriately utilize its test time by chaining operations and testing multiple hypotheses before providing an answer.
  • The proposed recipe e3 includes chaining skills, leveraging negative gradients, and coupling task difficulty with training token budget to enable in-context exploration, resulting in improved performance and extrapolation for Large Language Models.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app