ExDDV is a new dataset and benchmark designed to address the need for explainable deepfake detection in video.
The dataset contains approximately 5.4K real and deepfake videos, with manual annotations of text descriptions and artifact localization clicks.
Experiments with vision-language models on ExDDV demonstrate the importance of both text and click supervision in developing robust explainable deepfake detection models.
The dataset and code to reproduce the results are available at https://github.com/vladhondru25/ExDDV.