<ul><li>A new study introduces Modality-Balancing Preference Optimization (MBPO) to address modality imbalance in Large Multimodal Models (LMMs).</li><li>MBPO generates hard negatives to counter biases in Large Language Model (LLM) backbones and incorporates online responses with verified rewards using Group Relative Policy Optimization (GRPO).</li><li>The method aims to improve reasoning capabilities in LMMs and reduce hallucinations by balancing language prior biases over visual inputs.</li><li>Experiments show that MBPO enhances performance on vision-language tasks and effectively combats modality imbalance in LMMs.</li></ul>

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

Discover more