DASH (Detection and Assessment of Systematic Hallucinations) is an automatic, large-scale pipeline designed to identify systematic hallucinations of Vision-Language Models (VLMs) on real-world images in an open-world setting.
The pipeline utilizes DASH-OPT for image-based retrieval, optimizing over the 'natural image manifold' to generate images that mislead the VLM and expose its object hallucinations.
Applying DASH to PaliGemma and two LLaVA-NeXT models, it identifies more than 19k clusters with 950k images where the VLM hallucinates an object across 380 object classes.
The study also demonstrates that fine-tuning PaliGemma with the model-specific images obtained using DASH mitigates object hallucinations.