Listen to this story
|
Ever since the Indian open source community got its hands on Meta’s Llama 2, there has been a surge of several Indic language AI models. But there was no way to compare the capabilities of these models against each other. Just a few weeks back, AIM pointed out that there is a dire need for creating an Indic LLM Leaderboard.
Adithya S Kolavi, the founder, CEO, and AI researcher at CognitiveLab saw this, and took up the task to build an Indic LLM Leaderboard himself.
Studying at PES University, and juggling through internships, Kolavi founded CognitiveLab around a year ago. “I was covering web development and cloud, but I wanted to focus on generative AI,” Kolavi told AIM. He saw that there was a need for fine-tuning AI models for companies for their own tasks. “That is when the idea of CognitiveLab started,” he added.
CognitiveLab, with a lean team of 10, is providing fine-tuning as a service for companies globally. It built Cognitune, an enterprise-grade LLMops platform which reduced the production time for deploying LLMs by around 60% compared to other platforms.
The team also released Ambari, the first bilingual Kannada model built on top of Llama 2.
“After I created Ambari, I asked myself how I even evaluate this model,” Kolavi narrated the story of the genesis of the idea of an Indic LLM leaderboard. “I can use it very well, but how do I compare it with other Indian models? There was no uniformity,” he added.
This led Kolavi to embark on a project to develop a full-stack comprehensive framework encompassing everything from the training process to evaluation, with the added features of seamless integration and accessibility.
A lot of work to be done
The Indic LLM Leaderboard offers support for 7 Indic languages, including Hindi, Kannada, Tamil, Telugu, Malayalam, Marathi, and Gujarati, providing a comprehensive assessment platform. Hosted on Hugging Face, it initially supports 4 Indic benchmarks, with plans for additional benchmarks in the future.
“Presently, I’m heavily focused on refining the training core aspect of this framework, which essentially comprises three main products,” said Kolavi. The first product is ‘Indic LLM,’ an open-source framework designed for fine-tuning models using Mistral and Llama.
The second component is the indic_eval, a tool that simplifies the evaluation process by providing ready-made benchmarks that can be effortlessly incorporated into the platform. The last one is the Indic LLM Leaderboard as an alpha release to encourage usage and gather feedback on the framework.
However, the benchmarks are still a work in progress. Currently, it is utilising benchmarks derived from the Arc, HellaSwag, MMLU, BoolQ datasets, while using AI4Bharat’s Indic Trans model for translation. “This approach isn’t entirely satisfactory, as it is still a translation,” said Kolavi. “Now that everything is open source, researchers can focus on generating the benchmark for datasets and not needing to build anything from scratch.”
The Need for GPUs and Data
It takes just one NVIDIA A100 GPU to test the model on the Indic LLM Leaderboard, but Kolavi and his team were operating on just three GPUs for building the benchmarks. “The problem with Indic models is that they take more time than English models because the number of tokens is significantly higher,” said Kolavi. Just before the release of the evaluation metric, the company was running 10 GPUs for 10 different evals simultaneously.
“We wanted to see if other people can use it effectively, and there is no fault in the process,” Kolavi added. Since there is a lack of Indic language datasets, CognitiveLab utilised open-source datasets and data from the open web, and now they have been leveraging AI4Bharat’s dataset for training models.
CognitiveLab is part of several startup programs, such as AWS and Microsoft Azure, which gives the company access to GPUs. However, much of the research that the company does internally is funded by its own resources.
AI Should Become Second-Nature Internet
Kolavi’s next goal is to build a trustable LLM benchmark for Indic models. For this, he has been referencing several Chinese papers that created similar benchmarks for the Chinese language. “The next step will be to build up a benchmark to be added to the leaderboard, to give it more accountability.”
Kolavi loves the idea of open source and admires what Hugging Face and AI4Bharat are doing. If India wants to focus on something, it should be making AI accessible to everyone. “Even the remotest villages in the country should be able to access it,” he added. “It should become like a second-nature Internet where people can openly use and experience the usefulness of AI.”
“I am looking forward to the day when people integrate AI into common apps such as WhatsApp and have a Kannada interface. The whole experience of Indian languages becomes seamless,” Kolavi concluded, adding that researchers should focus on delivering purpose through the models and solving real problems instead of building foundational models in India.