Bengaluru-based AI startup Sarvam AI recently announced the launch of India’s first open-source foundational model, built completely from scratch.
The startup, which raised $41 million last year from the likes of Lightspeed, Peak XV Partners and Khosla Ventures, believes in the concept of sovereign AI- creating AI models tailored to address the specific needs and unique use cases of their country.
The model, called Sarvam 2B, is trained on 4 trillion tokens of data. It can take instructions in 10 Indic languages, including Hindi, Tamil, Telugu, Malayalam, Punjabi, Odia, Gujarati, Marathi, Kannada, and Bengali.
According to Vivek Raghavan, Sarvam 2B is among a class of Small Language Models (SLMs) that includes Microsoft’s Phi series models, Llama 3 8 billion, and Google’s Gemma models.
“This is the first open-source foundational model trained on an internal dataset of 4 trillion tokens by an Indian company, with compute in India, with efficient representation for 10 Indian languages,” Raghavan told AIM in an interaction prior to the announcement.
The model, which will be available on Hugging Face, is well suited for Indic language tasks such as translation, summarisation and understanding colloquial statements. The startup is open-sourcing the model to facilitate further research and development and to support the creation of applications built on it.
Previously, Tech Mahindra introduced its Project Indus foundational model, while Krutrim also developed its own foundational model from scratch. However, neither of these models is open-source.
India’s First Open-Source AudioLM
The startup, which Raghavan co-founded with Pratyush Kumar, also believes that in India, consumers will use generative AI through voice mode rather than text. At an event held in ITC Gardenia, Bengaluru, on August 13th, the startup announced Shuka 1.0–India’s first open-source audio language model.
The model is an audio extension of the Llama 8B model to support Indian language voice in and text out, which is more accurate than frontier models.
“The audio serves as the input to the LLM, with audio tokens being the key component here. This approach is notably unique. It’s somewhat similar to what GPT-4o introduced by OpenAI a couple of months ago,” Raghavan said.
According to the startup, the model is 6x more faster than Whisper + Llama 3. At the same time, its accuracy across the 10 languages is higher compared to Whisper+ Llama 3.
Previously, the startup has hinted extensively at developing a voice-enabled generative AI model. Startups and businesses aiming to incorporate voice experiences into their services can leverage this tool, particularly for Indian languages.
Raghavan also said that its aim is to make the model sound more human-like in the coming months.
Sarvam Agents are Here
Another interesting development announced by the startup is Sarvam Agents. Raghavan believes that AI’s real use case is not in the form of chatbots but in AI doing things on one’s behalf.
“Sarvam Agents are going to be voice-based, multilingual agents designed to solve specific business problems. They will be available in three channels– they can be available via telephony, it can be available via WhatsApp, and it can be available inside an app,” Raghavan said.
These agents are also available in 10 Indian languages, and the cost of these voice agents starts at a minimal cost of just INR 1/min. These AI agents can be deployed by contact centres or by sales teams of different enterprises, etc.
While these agents may sound like existing conversational AI products available in the market, Raghavan said their architecture, which uses multiple in-house developed LLMs, makes them fundamentally different.
“These agents can also be very contextual. For example, when you’re on a particular page, you press a button seeking more information about a particular item. The agent will be context-aware, so it knows where you’re asking from. In contrast, when you call a number, it starts from scratch without that context,” he said.
Sarvam Models APIs
While both Sarvam 2B and Shuka 1.0 are open-source models, Sarvam.ai is making available a bunch of close-sourced Indic models used in the creation of Sarvam agents ready to be consumed as APIs.
“These include five sets of models. I will tell you about the three important ones. Our first model, a speech-to-text model, translates spoken Indian languages into English with high accuracy, surpassing traditional ASR systems. The second model is a text-to-speech model which converts text into speech, offering diverse voices in multiple languages, with consistent or varied options depending on preference,” Raghavan said.
The third model is a parsing model designed for high-quality document extraction. This model addresses common issues with complex data, aiming to improve accuracy in parsing financial statements and other intricate documents.
Other announcements made by the startup include a generative AI workbench designed for law practitioners to enhance their capabilities with features such as regulatory chat, document drafting, redaction and data extraction.