Rime has introduced Arcana text-to-speech (TTS) model to create diverse and natural voices, leading to a 15% sales boost for customers like Domino’s and Wingstop.
The TTS model generates voices based on text descriptions, offering infinite variability across demographics and languages.
Rime's model enables users to specify desired voice characteristics such as age, gender, and language.
The company's innovative approach involves training the model on natural conversations with real people instead of voice actors.
Rime's Mist v2 TTS model is designed for high-volume, business-critical applications, enhancing customer interactions.
Rime offers eight flagship speakers with unique traits and the ability to switch between languages, adding nuances like laughter and sarcasm.
The model infers emotions from context and produces varied, realistic outputs using audio tokens decoded into speech.
Arcana TTS was trained in three stages, incorporating conversational techniques, idiolect, and multilingual code-switching.
Rime collected naturalistic conversational data by recording real conversations and annotating voices with detailed metadata.
The company focuses on personalized voices for different applications, providing tools for A/B testing and analytics to optimize voice performance.