<ul data-eligibleForWebStory="true"><li>Adversarial patch attacks are a significant threat to vision systems, involving perturbations that deceive deep models.</li><li>Traditional defense methods often necessitate retraining or fine-tuning, making them unsuitable for real-world deployment.</li><li>A new training-free Visual Retrieval-Augmented Generation (VRAG) framework is proposed for adversarial patch detection, integrating Vision-Language Models (VLMs).</li><li>VRAG leverages generative reasoning by retrieving visually similar patches and images to identify diverse attack types without additional training.</li><li>Various large-scale VLMs, such as Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO, are evaluated, with UI-TARS-72B-DPO achieving a state-of-the-art 95 percent classification accuracy for open-source adversarial patch detection.</li><li>The closed-source Gemini-2.0 model achieves the highest overall accuracy of 98 percent.</li><li>Experimental results showcase VRAG's efficacy in detecting various adversarial patches with minimal human annotation, offering a promising defense against evolving attacks.</li></ul>

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

Discover more