menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Quality ov...
source image

Arxiv

1d

read

52

img
dot

Image Credit: Arxiv

Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

  • The paper focuses on enhancing data efficiency by curating web-crawl datasets through an advanced approach named EcoDatum.
  • EcoDatum addresses challenges related to unstructured and heterogeneous datasets, overcoming biases and the exclusion of relevant data often seen in traditional curation methods.
  • The method incorporates quality-guided deduplication for balanced feature distributions and integrates various data curation operators within a weak supervision ensemble framework.
  • Automated optimization is used to effectively score each data point, leading to improved curation quality and efficiency compared to existing techniques.
  • EcoDatum outperforms state-of-the-art methods and ranked 1st on the DataComp leaderboard, achieving an average performance score of 0.182 across 38 evaluation datasets.
  • The approach demonstrated a 28% improvement over the DataComp baseline method, showcasing its effectiveness in enhancing dataset curation and model training efficiency.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app