<ul><li>IBM and Hugging Face Researchers have released SmolDocling, a 256M open-source vision language model (VLM) for document OCR.</li><li>SmolDocling provides a streamlined solution for end-to-end multi-modal document conversion tasks, processing entire pages through a single model.</li><li>It utilizes a universal markup format called DocTags to capture page elements and structures, and achieves high performance in benchmark tests.</li><li>SmolDocling is capable of handling diverse elements within documents and offers comprehensive structured metadata for enhanced usability.</li></ul>

IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR

Discover more