Vision-Language Models (VLMs) have shown remarkable performance improvements in recent years, but their large size can be a challenge for real-world applications with latency concerns.
To address this issue, a new approach called FREE (Fast and Robust Vision Language Models with Early Exits) proposes employing Early Exit (EE) strategies in VLMs, utilizing adversarial training within a GAN-based framework.
FREE focuses on input-adaptive inference to increase inference speed with minimal performance drop, training exit classifiers within VLMs to improve accuracy and model robustness while reducing overthinking and mid-crisis instances.
Experimental results show that FREE speeds up the inference process by more than 1.51x while maintaining comparable performance, with the source code available on GitHub.