Kvasir-VQA-x1 is a new multimodal dataset designed for medical reasoning and robust MedVQA in gastrointestinal endoscopy.
The dataset addresses the limitations of current datasets by incorporating 159,549 new question-answer pairs to test deeper clinical reasoning.
Questions in the dataset are stratified by complexity to evaluate a model's inference capabilities more effectively.
To prepare models for real-world clinical scenarios, visual augmentations that simulate common imaging artifacts have been included in the dataset.
Kvasir-VQA-x1 supports two evaluation tracks: one for standard VQA performance and the other to assess model robustness against visual perturbations.
The dataset aims to accelerate the development of more reliable and effective AI systems for clinical use by providing a challenging and clinically relevant benchmark.
Kvasir-VQA-x1 adheres to FAIR data principles, ensuring accessibility and transparency for the wider research community.
Code and data related to the dataset can be found on GitHub at https://github.com/Simula/Kvasir-VQA-x1
Access to the dataset is available at https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1