LLMs are the talk of the town – maybe a little too much. Yann LeCun from Meta AI had recently shared similar sentiments. “If you are a student interested in building the next generation of AI systems, don’t work on LLMs. This is in the hands of large companies, there’s nothing you can bring to the table,” said LeCun at the VivaTech conference in Paris.
“The only way you could possibly contribute is by analysing existing LLMs and showing their power and limitations,” said LeCun, about what the researchers should focus on. When it comes to India, the thoughts are quite similar.
Though LeCun had praised the creation of Kannada Llama created by Adarsh Shirawalmath, he seems to be advocating the use of Llama in products more than creating any more LLMs.
Similarly, Raj Dabre, a researcher at NICT in Kyoto and adjunct faculty at IIT Madras, said the same thing. “If you’re a student or an academic dreaming of making LLMs for Indian languages, stop wasting your time. You’re not going to make it,” he said.
Solving Niche Problems is the Answer
Naveen Rao, the VP of generative AI at Databricks, told AIM that a vast majority of foundational model companies will fail.
“You’ve got to do something better than they [OpenAI] do. And, if you don’t, and it’s cheap enough to move, then why would you use somebody else’s model? So it doesn’t make sense to me just to try to be ahead unless you can beat them,” he added.
Francois Chollet, the creator of Keras, also recently shared similar thoughts about this. “OpenAI has set back the progress towards AGI by 5-10 years because frontier research is no longer being published and LLMs are an offramp on the path to AGI,” he said in an interview.
The LLM boom and then the advent of Indic Llama models has been particularly crucial for the misdirected efforts in the Indian AI landscape. Dabre argues that aspiring to develop LLMs for Indian languages is a misguided ambition. According to him, without substantial computational resources and exceptional talent, these efforts are doomed to fall short.
Instead, he suggests that researchers concentrate on more niche, yet fundamental, challenges in AI. Dabre pointed out the importance of honing LLMs to perform specific tasks, tackling the data challenges inherent in LLMs, and improving transfer learning from English to other languages.
He also emphasised the critical need for efficiency and the integration of external knowledge into LLMs.
This is something similar to what Pranav Mistry, the founder of TWO.AI also argued recently about researching on 1-bit LLMs, instead of building on top of the existing ones, and barely expanding the vocabulary.
Similarly, Dabre’s critique is also rooted in the current inefficacies of Indian language models. He noted that significant resources are being wasted on developing models with limited improvements over existing ones.
“Wasting all that time and compute to get a 7B LLM to the same few shot classification performance as a 200M parameter Bert, but with terrible performance for actual generative tasks only proves that you don’t know what you are doing,” said Dabre.
However, the trend in India has been criticised for a lack of originality. Much of the work involves tweaking existing models like Llama 2 and rebranding them as new products. This approach has led to a proliferation of models such as Tamil Llama, Telugu Llama, and Kannada Llama, which are essentially built on top of open-source English language models.
This strategy has been seen as a missed opportunity for genuine innovation. AI experts lament that India’s AI research landscape is overly reliant on Western models, resulting in a dearth of groundbreaking research.
Not Just LLM Alternatives
It might be too soon to write off LLMs completely, but the discussion on alternatives to LLM-based models has been gaining traction. Mufeed VH, the creator of Devika, advocates for moving away from Transformer models and exploring different architectures like RMKV, an RNN-based model.
Mufeed highlighted the potential of such architectures to offer unlimited context windows and improved inference capabilities. He believes that by focusing on these alternatives, researchers can develop models that rival the capabilities of current leading models like GPT-4.
LeCun’s advocacy for new architectures aligns with his broader vision of democratising AI technology. He warns against the concentration of AI development in the hands of a few large entities, as this could stifle diversity of thought and innovation. LeCun emphasised the importance of open-source platforms to ensure that AI development is inclusive and widely accessible.
The consensus among leading AI researchers is clear: Indian researchers need to move away from developing redundant LLMs and focus on solving fundamental problems in AI. For now, even though one can say that these fine-tuned models cannot be classified as products, they are ideal for research for students in universities.