menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Cal-DPO: C...
source image

Arxiv

1w

read

164

img
dot

Image Credit: Arxiv

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

  • Cal-DPO is a new algorithm proposed for aligning large language models (LLMs) with human preference data.
  • It addresses the limitation of the contrastive preference optimization by calibrating the implicit reward to ensure comparability with ground-truth rewards.
  • Cal-DPO demonstrates theoretical advantages and significantly improves off-the-shelf methods in aligning LLMs with given preferences.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app