<ul><li>NVIDIA announced its new Llama Nemotron Nano VL, a vision-language model leading the OCRBench v2 benchmark for accurate document analysis.</li><li>The model can read and extract data from complex layouts like invoices, tables, graphs, and dashboards, using visual and textual reasoning on a single GPU.</li><li>OCRBench v2 confirmed Nemotron Nano VL's superior performance in text recognition, chart parsing, and element spotting with 10,000 Q&A pairs and 31 scenario types.</li><li>The model, built on C-RADIO v2 vision encoder and trained using Megatron and Energon infrastructure, is production-ready for scalable AI in document workflows.</li></ul>

NVIDIA’s New Vision Language Model Takes Lead in OCR Benchmarks

Discover more