Researchers in Russia propose using AI hallucinations to evaluate image realism by leveraging the tendency of large vision-language models to hallucinate.
By extracting 'atomic facts' about images and employing natural language inference, contradictions in statements generated by models can indicate unrealistic elements in images.
The approach involves assessing LVLM-generated statements for contradictions to determine image realism, showcasing a native capability of LVLMs in evaluation tasks.
The study emphasizes a practical method that can be implemented with open-source frameworks, offering a more accessible alternative to complex fine-tuning processes.
The method, tested on the WHOOPS! Dataset, involves generating multiple statements for images, comparing them using NLI, and aggregating scores to quantify image coherence.
The researchers evaluate the system using realistic and unrealistic image pairs, achieving a high human agreement on identifying 'weirdness' in images.
Their approach outperformed zero-shot methods tested, with contradictions proving more informative in distinguishing unrealistic images than entailment.
The research draws inspiration from FaithScore evaluation, highlighting the importance of consistency between LVLM-generated descriptions and image content.
The study's reliance on LVLM hallucinations to detect unrealistic images underscores the current limitations and potential of language models in image evaluation tasks.
While the method's effectiveness is subject to the current state of language models, it presents a novel approach to leveraging model hallucinations for image realism assessment.
Published on March 25, 2025, the research offers insights into utilizing AI capabilities to enhance image evaluation processes.