menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

MPPO: Mult...
source image

Arxiv

3d

read

160

img
dot

Image Credit: Arxiv

MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples

  • MPPO is a new algorithm for preference optimization in large language models (LLMs) with arbitrary negative samples.
  • Existing methods like DPO and KTO rely heavily on abundant preference data and require a reference model.
  • MPPO leverages the average likelihood of model responses to fit the reward function, maximizing the utilization of preference data.
  • Experimental results show that MPPO outperforms other methods like DPO and ORPO across various benchmarks.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app