menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Value-Guid...
source image

Arxiv

1w

read

255

img
dot

Image Credit: Arxiv

Value-Guided Search for Efficient Chain-of-Thought Reasoning

  • A new paper proposes a method for training value models on long-context reasoning traces for efficient chain-of-thought reasoning.
  • The method does not require a detailed notion of 'step' and uses a dataset of 2.5 million reasoning traces to train a 1.5B token-level value model, improving performance with test-time compute scaling.
  • Utilizing block-wise value-guided search with a final weighted majority vote, the approach achieves better test-time scaling compared to standard methods like majority voting or best-of-n.
  • With an inference budget of 64 generations, the proposed method reaches an average accuracy of 45.7% across multiple math benchmarks, reducing inference FLOPs required while achieving similar performance as majority voting.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app