menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

A Shortcut...
source image

Arxiv

3d

read

67

img
dot

Image Credit: Arxiv

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

  • A new benchmark called the Minimal Video Pairs (MVP) is introduced to assess video language models' physical understanding abilities.
  • Existing benchmarks may inflate scores due to shortcut solutions using superficial cues, which MVP aims to address.
  • MVP comprises 55K multiple-choice video QA examples related to physical world understanding from various video data sources.
  • The examples cover first-person egocentric and exocentric videos, robotic interaction data, and intuitive physics benchmarks.
  • Each sample in MVP includes a minimal-change pair to counter shortcut solutions, consisting of visually similar videos with opposing answers.
  • To answer correctly, a model must provide accurate answers for both examples in the minimal-change pair.
  • Human performance on MVP is 92.9%, while the best video-language model achieves 40.2% compared to random performance at 25%.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app