ElevenLabs has launched its new flagship text-to-speech model, Eleven v3, featuring inline audio controls, dialogue generation, and multi-speaker support for over 70 languages.
The model introduces audio tags for emotional control, supports dynamic speech turns with natural interruptions and shifts, offering significant advancements over previous versions.
The release emphasizes the improved audio quality and expressive nuances of v3, enhancing cadence, stress, and emotion across different languages.
While v3 is not recommended for real-time or conversational use due to latency issues, a promotional 80% discount on UI-based usage is offered until June, with public API access expected in the future.