OpenAI introduces new voice models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts for text and voice applications.
These models offer customization options for accents, pitches, tones, and emotions in text prompts.
The models excel in transcription and speech tasks, showcasing lower word error rates and improved performance in noisy environments across 100+ languages.
OpenAI's gpt-4o-transcribe family is designed for single or multiple voices inputting as a single channel, with no diarization capabilities.
Developers can implement the new voice models into apps with only nine lines of code, enabling fluid voice interactions.
The pricing for the models varies, with gpt-4o-mini-tts offering additional audio output token costs.
Competition in the AI transcription and speech space is intense, with other firms like ElevenLabs and Hume AI offering similar models with different features and pricing.
Companies like EliseAI and Decagon have reported improved voice AI performance after integrating OpenAI's models into their platforms.
Some reactions to OpenAI's new models have been mixed, with concerns raised about a shift away from real-time voice capabilities and an early leak prior to the official announcement.
OpenAI plans to continue refining its audio models, explore custom voice capabilities, and invest in multimodal AI for dynamic and interactive agent-based experiences.