menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

On Symmetr...
source image

Arxiv

3d

read

317

img
dot

Image Credit: Arxiv

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences

  • Optimizing policies based on human preferences is crucial for aligning language models with human intent.
  • This work proposes a framework for robust policy optimization under noisy preferences by viewing reward modeling as a classification problem.
  • The framework leverages symmetric losses, known for their robustness to label noise in classification, leading to the Symmetric Preference Optimization (SymPO) method.
  • Experiments conducted on synthetic and real-world tasks show the effectiveness of SymPO in successful policy optimization even with noisy labels.

Read Full Article

like

19 Likes

For uninterrupted reading, download the app