UHG
Search
Close this search box.

The Future of India’s AI is 1-bit LLMs

India needs lot of electricity in the future for AI? No, it just needs 1-bit LLMs.

Share

The Future of India’s AI is 1-bit LLMs

Illustration by Nikhil Kumar

Generative AI has frequently sparked discussions about electricity and the need for diverse energy sources. Recently, Ola CEO Bhavish Aggarwal made an interesting comparison between training AI models and running an Ola S1 scooter.

Aggarwal said, “1 H100 NVIDIA GPU consumes 30x electricity in a year as an Ola S1 scooter.” He said that an H100 GPU requires around 8.7 MWh of energy per year, whereas an S1 requires 0.25 MWh/year. “Need a lot of electricity in the future!” he added.

Krishan S Iyer, the CEO of NDR InvIT, called it an incorrect comparison. 

Aggarwal pointed out that a necessary step is needed to make AI models efficient within the country. “Not sure why the comparison between GPU and S1 scooter, but the problem is real. Grid capacity is becoming challenging for EV adoption, a lot more so in India,” replied Ganesh Raju, the co-founder of RapidEVchargE.

On the other hand, Pranav Mistry, the founder and CEO of TWO, which recently launched its Sutra line of AI models, disagreed with Aggarwal completely. “No, you need optimised AI models like SUTRA Light and innovations like 1-bit LLM.” Though this might simply be a promotion of TWO’s new AI model, the part about the 1-bit LLM does make sense. 

India Needs 1-bit LLMs

The conversation around 1-bit LLM started around February, when Microsoft released its paper titled The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

With the conversation shifting towards 1-bit LLMs, this might also be a shifting paradigm for designing hardware specifically optimised for these LLMs. “One-bit LLMs open new doors for designing custom hardware and systems specifically optimised for 1-bit LLMs,” said Furu Wei, one of the researchers of the 1-bit LLM paper.

Wei explained that these quantised models have multiple advantages as they fit on smaller chips, require less memory, and have faster processing. 

As for Krutrim, Aggarwal claims that the company is shifting to its own hardware and cloud for AI. “Only if Krutrim has plans for customised GPUs for its native market…,” replied Jainul Thakar.

Mistry further said that Aggarwal probably overlooked the fact that most of the energy consumption by H100 is during training the model, which is a one-time process. “ Inference is the main energy-hungry task,” he said. For inference, H100s are not the most-efficient GPUs. 

Since the claim is that we need more electricity to run AI models in the future, the solutions that Krutrim builds should be on the efficient and energy-saving side, which would also be ideal for inference on AI models.

During a discussion with AIM, Adithya S Kolavi, the founder of CongitiveLab, and Adarsh Shirawalmath, the founder of Tensoic, also said that there needs to be better quantisation and optimisation techniques for LLMs to run efficiently for the India market. 

Though the performance of these 1-bit LLMs in Indic language is yet to be measured and evaluated, discussions on Hacker News point to the fact that the existing models can also be converted into 1-bit LLMs in the future using the changed hardware. 

The Era of 1-bit LLMs

The crux of this innovation lies in the representation of each parameter in the model, commonly known as weights, using only 1.58 bits. Unlike traditional LLMs, which often employ 16-bit floating-point values (FP16) or FP4 by NVIDIA for weights, BitNet b1.58 restricts each weight to one of three values: -1, 0, or 1. 

This substantial reduction in bit usage is the cornerstone of the proposed model. It performs as well as the traditional ones with the same size and training data in terms of end-task performance. 

Most importantly, it is more cost-effective in terms of factors like latency, memory usage, throughput, and energy consumption. This is why it is necessary for Indian AI research in the future. 

This 1.58-bit LLM introduces a new way of scaling and training language models, offering a balance between high performance and cost-effectiveness. Additionally, it opens up possibilities for a new way of computing and suggests the potential for designing specialised hardware optimised for these 1-bit LLMs.

But for now, you still need to train the model from scratch for this optimisation, and the current paradigm is mostly on the NVIDIA H100, which might change soon as well.

India is mostly experimenting with AI’s inference part. One-bit LLMs are the way forward for India’s AI models, not just increasing the electricity requirement since that may not be the ideal way forward.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.