<ul><li>Kyutai has developed Hibiki, a 2.7 billion-parameter decoder-only model for real-time speech-to-speech and speech-to-text translation.</li><li>Hibiki operates at a 12.5Hz framerate with a 2.2kbps bitrate and supports French-to-English translation while preserving voice characteristics.</li><li>The model employs contextual alignment and a neural audio codec for efficient translation generation and dynamic adjustment of translation delays.</li><li>Hibiki demonstrates strong performance in translation quality, speaker fidelity, and maintains a competitive latency, offering practical benefits for real-time speech translation.</li></ul>

Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer

Discover more