Nvidia has launched the Parakeet-TDT-0.6B-v2, an open-source automatic speech recognition (ASR) model that can transcribe 60 minutes of audio in 1 second.
Parakeet-TDT-0.6B-v2 boasts 600 million parameters and offers a Word Error Rate (WER) of just 6.05%, competing with top proprietary transcription models.
The model is freely available under a Creative Commons CC-BY-4.0 license, supporting transcription services, voice assistants, and conversational AI platforms.
Trained on the Granary dataset, the model shows strong generalization performance, supports punctuation, capitalization, and can be deployed using Nvidia’s NeMo toolkit.