OpenAI has launched new speech-to-text and text-to-speech models in its API, providing developers with tools to build advanced voice agents.
The speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, improve word error rate and language recognition compared to Whisper models.
Developers can control how the text-to-speech model speaks using the gpt-4o-mini-tts model, expanding use cases in customer interactions and creative storytelling.
OpenAI plans to enhance the intelligence and accuracy of its audio models, explore custom voice options, and expand into video for multimodal agentic experiences.