Vision-language models (VLMs) have achieved remarkable success across diverse tasks with minimal labeled data.
Knowledge distillation (KD) is a solution to deploying large models in resource-constrained environments.
A new approach called Dual-Head Optimization (DHO) simplifies and improves knowledge distillation from VLMs to compact models in semi-supervised settings.
DHO outperforms baselines in various experiments, achieving state-of-the-art performance on ImageNet with less labeled data and parameters.