menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Open Source News

>

Salesforce...
source image

Marktechpost

1M

read

178

img
dot

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

  • Multimodal modeling aims to understand and generate content across visual and textual formats, integrating image recognition and generation into a unified system to enhance interactions.
  • BLIP3-o, developed by Salesforce Research in collaboration with academic institutions, introduces a family of unified multimodal models using CLIP embeddings and a sequential training approach for image understanding and generation.
  • The model leverages CLIP embeddings and a diffusion transformer for image synthesis, employing a dual-stage training strategy that enhances alignment and visual fidelity.
  • BLIP3-o outperforms in various benchmarks, achieving top scores in image generation alignment, reasoning ability, and image understanding, showcasing its superiority in subjective quality assessments.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app