UHG
Search
Close this search box.

Meet the Creators of India’s AWAAZ

AWAAZ stands out for its single-shot voice cloning capability, which can replicate a voice from a mere five-second audio clip.

Share

Meet the Creators of India’s AWAAZ

Illustration by Mohit Pandey

Listen to this story

Text-to-speech (TTS) models are comparatively easier to make in English than in other languages. To fill this gap, IIT Guwahati alumni Sudarshan Kamath and Akshat Mandloi started smallest.ai, and decided to create one for Hindi as well. They call it AWAAZ.

With state-of-the-art Mean Opinion Scores (MOS) in Hindi and Indian English, AWAAZ can fluently converse in over ten accents, reflecting the diverse linguistic landscape of India.

The inception of AWAAZ was driven by the founders’ recognition of a gap in the market for high-quality, affordable TTS models for Indian languages. “When we started building, we realised that the models required for a voice bot were not mature for Indian languages. Existing models for non-English languages were nowhere close to production,” explained Kamath in an exclusive interaction with AIM.

Citing OpenAI’s GPT-4o, which is a generalised model, Kamath said that the company aims to build specialised models that can be tailored for customer support, even for small business. It is also cheaper than other Indian language TTS models, such as Veed.io and Murf.ai.

Janta ki AWAAZ 

AWAAZ stands out for its single-shot voice cloning capability, which can replicate a voice from a mere five-second audio clip. The model also boasts a low streaming latency of just 200 milliseconds. 

To make this technology accessible, smallest.ai has set an introductory price of INR 999 for 500,000 characters, positioning AWAAZ as a cost-effective solution, claiming to be ten times cheaper than its competitors, such as ElevenLabs. 

Kamath said that the language model is about 750 million parameters in size, leveraged using existing open source models.

Kamath attributes the affordability of AWAAZ to their focus on data quality and model efficiency. “Our model is much smaller than those of competitors like ElevenLabs. Despite this, we achieve high-quality speech because our data is highly refined,” he explained. 

smallest.ai uses AWS for cloud services, although they remain flexible about potential future partnerships.

The Dataset of AWAAZ was the Critical Part

Kamath and Mandloi launched smallest.ai in October 2023. The initial goal was to create a voice bot for India capable of qualifying leads and handling customer support. This led to the development of SAPIEN, a voice bot for sales, marketing, and customer support. 

However, the lack of robust TTS models for Indian languages led them to focus on core model development, resulting in the creation of AWAAZ. “The data quality for TTS models reduced drastically when we moved away from English to other languages. It is worse for South Asian languages,” said Kamath.

The Indic data problem has been highlighted several times by researchers when speaking with AIM, be it for text or voice models. 

“We spent a lot of time perfecting the dataset, using over thousands of hours of audio from various people from different states in India. We focused on data quality to ensure a diverse representation, making our model suitable for production-level deployment,” Kamath said. 

The team invested significant resources into this endeavour, with over six months dedicated purely to the development and iterations of data quality.

AWAAZ is currently limited to Hindi and Indian English, but Kamath emphasises the importance of understanding the quality of the output. “The most difficult part is the data. If you tried our model in Tamil, it might respond a little, but we don’t advertise that capability because it’s not up to our standards yet,” he said. 

Way Forward

The company’s ambitious roadmap includes expanding the model’s capabilities. “Our next step is moving closer to GPT-4o-like abilities for Indian languages, where the model can generate answers with a voice, enhancing the interactive experience,” Kamath revealed. 

Additionally, smallest.ai is exploring the development of voice-to-voice models, aiming to offer custom solutions for specific business needs such as lead qualification and customer support.

The founders are committed to AI’s understanding of multimodal data. 

“We’ve been fascinated by AI’s potential to understand more than just text. Speech is one of those areas where AI can truly start to seem human, much like in the movie ‘Her’,” Kamath said, reflecting on the broader vision that drives their work.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.