Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training.
Current post-training methods primarily utilize single-sample comparisons, which fail to capture critical characteristics such as generative diversity and bias.
To address these limitations, the authors introduce Multi-sample Direct Preference Optimization (mDPO) and Multi-sample Identity Preference Optimization (mIPO), which focus on group-wise characteristics.
Empirical results show that multi-sample comparisons are more effective in optimizing collective characteristics for generative models, and provide a more robust optimization framework for datasets with label noise.