menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Open Source News

>

New fully ...
source image

VentureBeat

12h

read

296

img
dot

Image Credit: VentureBeat

New fully open source vision encoder OpenVision arrives to improve on OpenAI’s Clip, Google’s SigLIP

  • The University of California, Santa Cruz has introduced OpenVision, a new family of vision encoders that aims to enhance existing models like OpenAI's CLIP and Google's SigLIP.
  • Vision encoders convert visual content into numerical data for non-visual AI models, facilitating tasks such as image recognition within large language models.
  • OpenVision offers 26 models with parameters ranging from 5.9 million to 632.1 million under the Apache 2.0 license for commercial use.
  • Developed by a team at UCSC, OpenVision leverages the CLIPS training pipeline and Recap-DataComp-1B dataset for training.
  • The models cater to various use cases, with larger models suitable for high accuracy tasks and smaller ones optimized for edge deployments.
  • OpenVision demonstrates strong performance in vision-language tasks and outperforms CLIP and SigLIP in benchmark evaluations.
  • The training strategy of progressive resolution training leads to faster training with no loss in performance in high-resolution tasks like OCR.
  • The use of synthetic captions and text decoder during training enhances the semantic representation learning of the vision encoder.
  • OpenVision facilitates integration with small language models for efficient multimodal model development with limited parameters.
  • The open and modular approach of OpenVision benefits AI engineering, data infrastructure, and security teams by offering a plug-and-play solution for vision capabilities.

Read Full Article

like

17 Likes

For uninterrupted reading, download the app