<ul><li>Cosyvoice, maintained by Jichengdu on Replicate, is a scalable multilingual text-to-speech system known for advanced voice cloning capabilities.</li><li>The model, built on large language model architecture, supports streaming synthesis, cross-lingual generation, and bidirectional streaming.</li><li>It focuses on low-latency performance and high-quality output, standing out among related models like OpenVoice and Parler TTS.</li><li>Cosyvoice takes text and reference audio as inputs to generate natural-sounding speech in multiple languages and styles, producing WAV format speech output at a 16kHz sample rate.</li></ul>

A beginner's guide to the Cosyvoice model by Jichengdu on Replicate

Discover more