<ul data-eligibleForWebStory="true"><li>The rise of vision foundation models (VFMs) has led to the need for systematic evaluation.</li><li>Pairing VFMs with large language models (LLMs) for evaluation on Visual Question Answering (VQA) benchmarks is a common approach, but it has blind spots.</li><li>AVA-Bench is introduced as the first benchmark disentangling 14 Atomic Visual Abilities (AVAs) to address evaluation gaps.</li><li>AVA-Bench focuses on foundational skills like localization, depth estimation, and spatial understanding that support visual reasoning tasks.</li><li>The benchmark decouples AVAs and matches training and test distributions to pinpoint VFM strengths and weaknesses.</li><li>AVA-Bench helps in revealing distinct 'ability fingerprints' of leading VFMs, improving selection accuracy.</li><li>A 0.5B LLM performs similarly in VFM rankings as a 7B LLM but reduces GPU hours by 8x for more efficient evaluation.</li><li>AVA-Bench aims to offer a transparent benchmark for the next generation of VFMs.</li></ul>

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Discover more