Are AI agents the next big thing? The co-founders of Sarvam AI definitely think so. One of the startup’s theses is that consumers of AI will use generative AI models not just as a chatbot, but to perform tasks and achieve goals and that too, through a voice interface rather than text.
At an event held in Bengaluru on August 13th, Sarvam AI announced Sarvam Agents. While the startup, which is backed by Lightspeed, Peak XV, and Khosla Ventures, is not the only company building AI agents, what stood out was the pricing.
The cost of these agents starts at just one rupee per minute. According to co-founder Vivek Raghavan, enterprises can integrate these agents into their workflow without much hassle.
“These are going to be voice-based, multilingual agents designed to solve specific business problems. They will be available in three channels – telephony, WhatsApp, or inside an app,” Raghavan told AIM in an interaction prior to the event.
These agents could be integrated into contact centres and various applications across multiple industries, including insurance, food and grocery delivery, e-commerce, ride-hailing services, and even banking and payment apps.
For example, they could streamline customer service operations in insurance by handling policy inquiries, make reservations, assist with financial transactions, facilitate order tracking and customer support in food delivery, and manage ride requests and driver communications in ride-hailing apps.
Enabling AI Agents
A technology that offers this capability at just a rupee per minute could be transformative. AI adoption could see substantial growth with AI agents, and Sarvam AI’s mission is to make this a reality.
Meta, which owns WhatsApp and other major social media platforms like Facebook and Instagram, introduced Meta AI to all these platforms.
Meta AI can be summoned in group chats for planning and suggestions, it can make restaurant recommendations, trip planning assistance, and also provide general information.
However, Sarvam AI claims their generative AI stack could help AI scale in India compared to others. Their models perform better in Indic languages than the Llama models, which is powering Meta AI. During the event, the startup demoed their models, which managed to outperform certain models in Indic language tasks.
The startup is currently making its agents available in Hindi, Tamil, Telugu, Malayalam, Punjabi, Odia, Gujarati, Marathi, Kannada, and Bengali, and plans to add more languages soon.
Interestingly, given the backgrounds of the co-founders, especially Raghavan, who has helped Aadhaar scale significantly in India, the startup is well-positioned to drive widespread AI adoption and impact.
Raghavan served as the chief product officer at the Unique Identification Authority of India (UIDAI) for over nine years. As of September 29, 2023, over 138.08 crore Aadhaar numbers were issued to the residents of India.
As part of the interaction, Raghavan highlighted his experience in scaling technology to benefit humanity. He also mentioned that the startup is already in talks with several companies interested in utilising Sarvam agents. At the event, the startup revealed that their agent is already being integrated into the Sri Mandir app.
(Vivek Raghavan & Pratyush Kumar, co-founders at Sarvam AI)
Models Powering Sarvam Agents
Raghavan said there are multiple models that form the backbone of these AI agents. The first is a speech-to-text model called Saaras which translates spoken Indian languages into English with high accuracy, surpassing traditional ASR systems.
The second model, called Bulbul, is text-to-speech, offering diverse voices in multiple languages with consistent or varied options depending on preference.
The third is a parsing model designed for high-quality document extraction. This model addresses common issues with complex data, aiming to improve accuracy in parsing financial statements and other intricate documents.
Notably, these models are closed-source and available to customers as AI. However, the startup also launched an open-source, two billion-parameter foundational model trained on four trillion tokens and completely from scratch.
Less Dramatic but Good Demo
At the event, the startup also demoed what their agents could do. The demo, which was pre-recorded, showcased how a Sarvam agent could comprehend a person’s health condition, assist in finding the right doctor, and even book an appointment.
A pre-recorded demo may not appeal to everyone, but from the startup’s perspective, it’s a safe bet and completely understandable. Live demos carry inherent risks; for instance, at the Made by Google event, one Googler’s attempt to showcase Google Gemini’s capabilities live saw them fail twice before succeeding.
Sarvam AI’s demo was also reminiscent of OpenAI’s showcase of their latest model, GPT-4o, earlier this year. While Sarvam AI’s demo was less dramatic and also not at all controversial, it effectively demonstrated that their agents could understand the context as well as various Indian languages and dialects.
“These agents can also be very contextual. For example, when you’re on a particular page, you press a button seeking more information about a particular item. The agent will be context-aware, so it knows where you’re asking from. In contrast, when you call a number, it starts from scratch without that context,” Raghavan said.
The startup revealed it trained its models using NVIDIA DGX, leveraging Yotta’s infrastructure. Other notable collaborators include Exotel, Bhashini, AI4Bharat, EkStep Foundation and People+ai.