menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Technology News

>

OpenAI’s n...
source image

VentureBeat

4w

read

253

img
dot

Image Credit: VentureBeat

OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

  • OpenAI introduces new voice models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts for text and voice applications.
  • These models offer customization options for accents, pitches, tones, and emotions in text prompts.
  • The models excel in transcription and speech tasks, showcasing lower word error rates and improved performance in noisy environments across 100+ languages.
  • OpenAI's gpt-4o-transcribe family is designed for single or multiple voices inputting as a single channel, with no diarization capabilities.
  • Developers can implement the new voice models into apps with only nine lines of code, enabling fluid voice interactions.
  • The pricing for the models varies, with gpt-4o-mini-tts offering additional audio output token costs.
  • Competition in the AI transcription and speech space is intense, with other firms like ElevenLabs and Hume AI offering similar models with different features and pricing.
  • Companies like EliseAI and Decagon have reported improved voice AI performance after integrating OpenAI's models into their platforms.
  • Some reactions to OpenAI's new models have been mixed, with concerns raised about a shift away from real-time voice capabilities and an early leak prior to the official announcement.
  • OpenAI plans to continue refining its audio models, explore custom voice capabilities, and invest in multimodal AI for dynamic and interactive agent-based experiences.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app