menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

A Large-Sc...
source image

Arxiv

3d

read

131

img
dot

Image Credit: Arxiv

A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI

  • A large-scale vision-language dataset derived from open scientific literature, Biomedica, has been introduced to advance biomedical generalist AI.
  • The dataset contains over 6 million scientific articles, 24 million image-text pairs, and 27 metadata fields, including expert human annotations.
  • Scalable streaming and search APIs are provided for easy access to the dataset, facilitating seamless integration with AI systems.
  • The utility of the Biomedica dataset has been demonstrated through the development of embedding models, chat-style models, and retrieval-augmented chat agents, outperforming previous open systems.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app