<ul><li>Researchers propose COSMIC (Clique-Oriented Semantic Multi-space Integration for CLIP), a test-time adaptation framework for vision-language models (VLMs).</li><li>COSMIC enhances adaptability through multi-granular, cross-modal semantic caching and graph-based querying mechanisms.</li><li>The framework introduces Dual Semantics Graph (DSG) to capture rich semantic relationships by incorporating textual features, coarse-grained CLIP features, and fine-grained DINOv2 features.</li><li>The Clique Guided Hyper-class component leverages structured class relationships to enhance prediction robustness in COSMIC.</li></ul>

COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation

Discover more