menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

SDPO: Impo...
source image

Arxiv

3d

read

71

img
dot

Image Credit: Arxiv

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

  • Preference learning is crucial for aligning generative models with human expectations.
  • Existing approaches for diffusion models like Diffusion-DPO face challenges of timestep-dependent instability and off-policy bias.
  • A new method called SDPO (Importance-Sampled Direct Preference Optimization) addresses these challenges by incorporating importance sampling into the objective to correct off-policy bias effectively.
  • Experiments show that SDPO outperforms standard Diffusion-DPO in terms of VBench scores, human preference alignment, and training robustness.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app