menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Fusing Cro...
source image

Arxiv

4d

read

69

img
dot

Image Credit: Arxiv

Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach

  • Cross-modal embeddings like CLIP, BLIP have shown promise in aligning representations across modalities but may underperform on modality-specific tasks.
  • Single-modality embeddings excel within their domains but lack cross-modal alignment capabilities.
  • RP-KrossFuse is proposed as a method to unify cross-modality and single-modality embeddings by integrating them using a random projection-based Kronecker product.
  • RP-KrossFuse aims to achieve competitive modality-specific performance while preserving cross-modal alignment, demonstrated through numerical experiments combining CLIP embeddings with uni-modal image and text embeddings.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app