Generative AI has frequently sparked discussions about electricity and the need for diverse energy sources. Recently, Ola CEO Bhavish Aggarwal made an interesting comparison between training AI models and running an Ola S1 scooter.
Aggarwal said, “1 H100 NVIDIA GPU consumes 30x electricity in a year as an Ola S1 scooter.” He said that an H100 GPU requires around 8.7 MWh of energy per year, whereas an S1 requires 0.25 MWh/year. “Need a lot of electricity in the future!” he added.
Krishan S Iyer, the CEO of NDR InvIT, called it an incorrect comparison.
Aggarwal pointed out that a necessary step is needed to make AI models efficient within the country. “Not sure why the comparison between GPU and S1 scooter, but the problem is real. Grid capacity is becoming challenging for EV adoption, a lot more so in India,” replied Ganesh Raju, the co-founder of RapidEVchargE.
On the other hand, Pranav Mistry, the founder and CEO of TWO, which recently launched its Sutra line of AI models, disagreed with Aggarwal completely. “No, you need optimised AI models like SUTRA Light and innovations like 1-bit LLM.” Though this might simply be a promotion of TWO’s new AI model, the part about the 1-bit LLM does make sense.
India Needs 1-bit LLMs
The conversation around 1-bit LLM started around February, when Microsoft released its paper titled The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
With the conversation shifting towards 1-bit LLMs, this might also be a shifting paradigm for designing hardware specifically optimised for these LLMs. “One-bit LLMs open new doors for designing custom hardware and systems specifically optimised for 1-bit LLMs,” said Furu Wei, one of the researchers of the 1-bit LLM paper.
Wei explained that these quantised models have multiple advantages as they fit on smaller chips, require less memory, and have faster processing.
As for Krutrim, Aggarwal claims that the company is shifting to its own hardware and cloud for AI. “Only if Krutrim has plans for customised GPUs for its native market…,” replied Jainul Thakar.
Mistry further said that Aggarwal probably overlooked the fact that most of the energy consumption by H100 is during training the model, which is a one-time process. “ Inference is the main energy-hungry task,” he said. For inference, H100s are not the most-efficient GPUs.
Since the claim is that we need more electricity to run AI models in the future, the solutions that Krutrim builds should be on the efficient and energy-saving side, which would also be ideal for inference on AI models.
During a discussion with AIM, Adithya S Kolavi, the founder of CongitiveLab, and Adarsh Shirawalmath, the founder of Tensoic, also said that there needs to be better quantisation and optimisation techniques for LLMs to run efficiently for the India market.
Though the performance of these 1-bit LLMs in Indic language is yet to be measured and evaluated, discussions on Hacker News point to the fact that the existing models can also be converted into 1-bit LLMs in the future using the changed hardware.
The Era of 1-bit LLMs
The crux of this innovation lies in the representation of each parameter in the model, commonly known as weights, using only 1.58 bits. Unlike traditional LLMs, which often employ 16-bit floating-point values (FP16) or FP4 by NVIDIA for weights, BitNet b1.58 restricts each weight to one of three values: -1, 0, or 1.
This substantial reduction in bit usage is the cornerstone of the proposed model. It performs as well as the traditional ones with the same size and training data in terms of end-task performance.
Most importantly, it is more cost-effective in terms of factors like latency, memory usage, throughput, and energy consumption. This is why it is necessary for Indian AI research in the future.
This 1.58-bit LLM introduces a new way of scaling and training language models, offering a balance between high performance and cost-effectiveness. Additionally, it opens up possibilities for a new way of computing and suggests the potential for designing specialised hardware optimised for these 1-bit LLMs.
But for now, you still need to train the model from scratch for this optimisation, and the current paradigm is mostly on the NVIDIA H100, which might change soon as well.
India is mostly experimenting with AI’s inference part. One-bit LLMs are the way forward for India’s AI models, not just increasing the electricity requirement since that may not be the ideal way forward.