Hume AI has launched Octave, a new text-to-speech model that can generate custom AI voices with adjustable emotions for various content forms.
Octave is powered by a large language model trained on text, speech, and emotion tokens to enhance the quality and authenticity of generated voices.
Users can adjust emotions, tone, rhythm, and cadence of the AI voices on sentence-level using natural language prompts.
The model can interpret character traits, adjust vocal inflections accordingly, and contextually deliver emotionally accurate speech.
It supports English and Spanish languages with plans to expand further, catering to content creators in audiobooks, podcasts, video games, and voiceovers.
Octave allows granular adjustments like expressing nuanced emotions within sentences, enhancing the overall voice modulation.
Hume's API provides access to the Octave model, enabling up to 50 requests per minute with various text and description length limits.
The pricing model for Octave TTS is subscription-based and competitively priced, offering tiers from free to Enterprise plans.
Octave was preferred over a competitor in a blind comparison study for its audio quality, naturalness, and voice matching accuracy.
The model is trained on extensive language tokens and datasets, enabling it to infer emotions and maintain consistent character voices in long-form content.