menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Pass@K Pol...
source image

Arxiv

2w

read

94

img
dot

Image Credit: Arxiv

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

  • Pass-at-k Policy Optimization (PKPO) proposed as a fix for limitations in Reinforcement Learning algorithms optimizing for pass@1 performance.
  • PKPO transforms final rewards to optimize for pass@k performance, prioritizing sets of samples that maximize reward when considered jointly.
  • Novel low variance unbiased estimators derived for pass@k and its gradient in both binary and continuous reward settings.
  • PKPO enables robust optimization of pass@k for any arbitrary k <= n, allowing for annealing k during training to optimize both pass@1 and pass@k metrics.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app