<ul><li>Probing Visual Language Priors in VLMs</li><li>A new benchmark called ViLP is introduced to investigate the reliance of Vision-Language Models (VLMs) on visual language priors.</li><li>ViLP consists of out-of-distribution images and associated Q&A pairs, which require true visual reasoning rather than text priors.</li><li>A self-improving framework is proposed to enhance VLM performance by generating new VQA data and applying corruptions to emphasize actual visual inputs.</li></ul>

Probing Visual Language Priors in VLMs

Discover more