menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Open Source News

>

Salesforce...
source image

Marktechpost

5d

read

159

img
dot

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

  • Multimodal modeling aims to understand and generate content across visual and textual formats, integrating image recognition and generation into a unified system to enhance interactions.
  • BLIP3-o, developed by Salesforce Research in collaboration with academic institutions, introduces a family of unified multimodal models using CLIP embeddings and a sequential training approach for image understanding and generation.
  • The model leverages CLIP embeddings and a diffusion transformer for image synthesis, employing a dual-stage training strategy that enhances alignment and visual fidelity.
  • BLIP3-o outperforms in various benchmarks, achieving top scores in image generation alignment, reasoning ability, and image understanding, showcasing its superiority in subjective quality assessments.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app