<ul><li>OpenAI has launched new speech-to-text and text-to-speech models in its API, providing developers with tools to build advanced voice agents.</li><li>The speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, improve word error rate and language recognition compared to Whisper models.</li><li>Developers can control how the text-to-speech model speaks using the gpt-4o-mini-tts model, expanding use cases in customer interactions and creative storytelling.</li><li>OpenAI plans to enhance the intelligence and accuracy of its audio models, explore custom voice options, and expand into video for multimodal agentic experiences.</li></ul>

OpenAI Releases New Audio Models to Power Voice Agents

Discover more