AI-driven solutions in India require high-quality speech models for the country’s diverse linguistic communities; however, sufficient data for all Indian languages and dialects is unavailable.
Research group AI4Bharat recently launched Indic Parler-TTS, an open-source text-to-speech (TTS) model built for over one billion Indic speakers aiming to make accessibility more effective.
IISc AI and Robotics Technology Park also decided to open-source 16,000 hours of spontaneous speech data from 80 districts as part of Project Vaani, which aims to curate datasets of 150,000 hours of natural speech and text from around one million people across 773 districts in India.
Sarvam AI, Ankush Sabharwal of CoRover.ai, and smallest.ai are among start-ups heavily focused on building speech models for the Indian market. CoRover.ai are building voice models for WhatsApp translation and Q&A, whilst Sarvam AI has also launched voice-based agents.
Indic Parler-TTS is trained on 1,806 hours of multilingual and English datasets and currently supports 20 of the 22 scheduled Indian languages, including English in US, British, and Indian accents. It has a permissive license with unrestricted usage, and includes 69 unique voices which can render emotions in 10 languages.
The dataset includes BhasaAnuvaad, IndicConformer ASR model, Rasa, and IndicASR, among several other things introduced by IIT Bharat to enhance Indian language technology.
The Indian government launched a crowdsourcing initiative called Bhasha Daan in July for collecting voice and text data in multiple Indian languages. It also launched the 'Be our Sahayogi' programme on National Technology Day to crowdsource multilingual AI problem statements.
EkStep Foundation also open-sourced the wav2vec2 model after training it on 10,000 hours of speech data in 23 Indic languages. The Vakyansh team at the foundation was one of the first in the country to build Automatic Speech Recognition (ASR) and TTS models.
AI4Bharat, IISc, and EkStep provide important speech-related datasets and models, particularly aimed at speech translation.
The models required for a voice bot were not mature for Indian languages, according to Sudarshan Kamath, CEO of smallest.ai.