Mistral released an open-sourced voice model called Voxtral that offers advanced features like summarization and speech-triggered functions.
Voxtral comes in a 24B parameter version for scale applications and a 3B variant for local and edge use cases.
Mistral aims to bridge the gap between proprietary speech recognition models and open-source versions with Voxtral, providing accurate transcription, semantic understanding, multilingual fluency, and flexible deployment at half the price of comparable APIs.
Voxtral outperforms existing voice models, offering fewer word errors compared to other models and competitive performance in audio understanding tasks. It will be available through Mistral's API at $0.001 per minute.