menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Is Best-of...
source image

Arxiv

3d

read

236

img
dot

Image Credit: Arxiv

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

  • Inference-time computation provides an important axis for scaling language model performance.
  • Naively scaling compute through techniques like Best-of-$N$ sampling can cause performance degradation due to reward hacking.
  • Theoretical analysis of inference-time alignment algorithms reveals the importance of the pre-trained policy's coverage for performance and compute scaling.
  • The introduction of $ exttt{InferenceTimePessimism}$ algorithm mitigates reward hacking and exhibits optimal performance and scaling-monotonic characteristics.

Read Full Article

like

14 Likes

For uninterrupted reading, download the app