<ul><li>Vision-Language Models (VLMs) like CLIP have shown impressive performance in cross-modal tasks through large-scale pre-training.</li><li>Parameter-Efficient Fine-Tuning (PEFT) techniques such as LoRA have emerged as scalable alternatives to full fine-tuning for adapting transformer-based models like VLMs efficiently.</li><li>Adversarial attacks can significantly impact the performance of VLMs, and adversarial training is crucial for improving model robustness in few-shot scenarios.</li><li>AdvCLIP-LoRA is introduced as the first algorithm to enhance the adversarial robustness of CLIP models fine-tuned with LoRA in few-shot settings, providing theoretical guarantees for convergence and showing significant improvements in robustness against common adversarial attacks while maintaining clean accuracy.</li></ul>

Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models

Discover more