UHG
Search
Close this search box.

Time To Scale Down Large Language Models

Advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs.

Share

Renowned research scientist Andrej Karpathy recently said that the llm.c project showcases how GPT-2 can now be trained in merely 24 hours on a single 8XH100 GPU node—for just $672. 

Karpathy’s journey began with an interest in reproducing OpenAI’s GPT-2 for educational purposes. He initially encountered obstacles in using PyTorch, a popular deep-learning framework. 

Frustrated by these challenges, Karpathy decided to write the entire training process from scratch in C/CUDA, resulting in the creation of the llm.c project. It eventually evolved into a streamlined, efficient system for training language models.

The project, which implements GPT training in C/CUDA, has minimal setup requirements and offers efficient and cost-effective model training.

Scaling down LLMs 

In his post, Karparthy mentioned how advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs. 

Mauro Sicard, the director of BRIX Agency, agreed with Karparthy. “With the improvements in both GPUs and training optimisation, the future may surprise us,” he said.

Scaling down LLM models while maintaining performance is a crucial step in making AI more accessible and affordable. 

According to Meta engineer Mahima Chhagani, LLMLingua is a method designed to efficiently decrease the size of prompts without sacrificing significant information. 

Chhagani said using an LLM cascade, starting with affordable models like GPT-2 and escalating to more powerful ones like GPT-3.5 Turbo and GPT-4 Turbo, optimises cost by only using expensive models when necessary.

FrugalGPT is another approach that uses multiple APIs to balance cost and performance, reducing costs by up to 98% while maintaining a performance comparable to GPT-4. 

Additionally, a Reddit developer named pmarks98 used a fine-tuning approach with tools like OpenPipe and models like Mistral 7B, cutting costs by up to 88%.

Is there a Real Need to Reduce Costs?

Cheaper LLMs, especially open-source models, often have limited capabilities compared to the proprietary models from tech giants like OpenAI or Google. 

While the upfront costs may be lower, running a cheap LLM locally can lead to higher long-term costs due to the need for specialised hardware, maintenance overheads, and limited scalability.

Moreover, as pointed out by Princeton professor Arvind Narayanan, the focus has shifted from capability improvements to massive cost reductions, which many AI researchers find disappointing.

Cost over Capability Improvements

Narayanan argued that cost reductions are more exciting and impactful for several reasons. They often lead to improved accuracy in many tasks. Lower costs can also accelerate the pace of research by turning it more affordable and making more functionalities accessible.

So, in terms of what will make LLMs more useful in people’s lives, cost is hands down more significant at this stage than capability, he said.

In another post, Narayanan said that the cheaper a resource gets, the more demand there will be for it. Maybe in the future it will be common to build applications that invoke LLMs millions of times in the process of completing a simple task.
This democratisation of AI could accelerate faster than we imagined, possibly leading to personal AGIs for $10 by 2029.

📣 Want to advertise in AIM? Book here

Picture of Anshul Vipat

Anshul Vipat

Anshul Vipat is a tech aficionado, enthusiastic about the latest innovations in the digital world. He also holds keen interest in traveling, exploring and cooking
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.