<ul><li>BLIP3-KALE is an open-source dataset of 218 million image-text pairs.</li><li>It addresses the limitations of previous image caption datasets by combining synthetic captions and real-world information.</li><li>KALE features knowledge-augmented dense captions that provide rich descriptive detail and factual grounding.</li><li>It sets a new benchmark for density and factual grounding in image descriptions.</li></ul>

BLIP3-KALE: An Open-Source Dataset of 218 Million Image-Text Pairs Transforming Image Captioning with Knowledge-Augmented Dense Descriptions

Discover more