ElevenLabs has launched Scribe v1, a speech-to-text model with the highest accuracy rate so far, achieving 96.7% accuracy for English.
Scribe outperforms Google's Gemini 2.0 Flash, OpenAI's Whisper v3, and Deepgram Nova-3 in accurately converting spoken speech into text.
The model delivers state-of-the-art transcription accuracy in 99 languages and can distinguish and isolate up to 32 different speakers in the same audio file.
Scribe is available now through the ElevenLabs website and API, with pricing set at $0.40 per hour of input audio.