Zyphra has introduced the beta release of Zonos-v0.1, a highly expressive text-to-speech (TTS) model with high-fidelity voice cloning.
The release includes a 1.6 billion-parameter transformer model and a similarly sized hybrid model. These models are trained on a large dataset encompassing various languages and offer controls over parameters such as speaking rate, pitch variation, and emotions.
Zonos-v0.1 features zero-shot TTS with voice cloning, audio prefix inputs, multilingual support, audio quality and emotion control, efficient performance, user-friendly interface, and straightforward deployment.
The models are optimized for real-time applications and can be easily integrated into existing workflows. Zyphra's Zonos-v0.1 represents a significant advancement in open-source TTS development.