menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Preference...
source image

Arxiv

1M

read

150

img
dot

Image Credit: Arxiv

Preference Optimization with Multi-Sample Comparisons

  • Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training.
  • Current post-training methods primarily utilize single-sample comparisons, which fail to capture critical characteristics such as generative diversity and bias.
  • To address these limitations, the authors introduce Multi-sample Direct Preference Optimization (mDPO) and Multi-sample Identity Preference Optimization (mIPO), which focus on group-wise characteristics.
  • Empirical results show that multi-sample comparisons are more effective in optimizing collective characteristics for generative models, and provide a more robust optimization framework for datasets with label noise.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app