<ul><li>Ideal text-to-image (T2I) retrievers should prioritize specific visual attributes relevant to queries.</li><li>CLIP-like retrievers have poor performance on attribute-focused queries due to focusing on global semantics and subjects, leaving out other details.</li><li>Recent Multimodal Large Language Model (MLLM)-based retrievers also struggle with limitations in handling attribute-focused queries.</li><li>Proposal to use promptable image embeddings to boost performance by highlighting required attributes, with acceleration strategies to enhance real-world applicability.</li></ul>

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

Discover more