Soket AI Labs, part of IndiaAI Mission, plans to build a 7 billion parameter open-source Indic LLM within 6 months.
Led by CEO Abhishek Upperwal, the company aims to scale the project to 120 billion parameters over time.
The roadmap includes building smaller models first to test architecture and data alignment before the final build.
Soket AI focuses on sectors like defense, healthcare, and education for model optimization.
The data strategy involves intense focus on Indian languages and sectors, including creating new benchmarks.
Partnerships with IIT Gandhinagar aid in data generation, including through translation and augmentation strategies.
Despite challenges in receiving GPU support from the government, Soket plans to access computational power for cloud-based scaling.
Soket emphasizes building culturally authentic models for Indian languages, aiming to address dialect nuances and grammatical mistakes not covered by existing global models.
The project encourages collaboration and open-sourcing in the AI community, focusing on research acceleration rather than immediate commercial products.
CEO Upperwal believes in the long-term success potential of such initiatives despite initial criticisms and challenges.