NVIDIA announced its new Llama Nemotron Nano VL, a vision-language model leading the OCRBench v2 benchmark for accurate document analysis.
The model can read and extract data from complex layouts like invoices, tables, graphs, and dashboards, using visual and textual reasoning on a single GPU.
OCRBench v2 confirmed Nemotron Nano VL's superior performance in text recognition, chart parsing, and element spotting with 10,000 Q&A pairs and 31 scenario types.
The model, built on C-RADIO v2 vision encoder and trained using Megatron and Energon infrastructure, is production-ready for scalable AI in document workflows.