UHG
Search
Close this search box.

Andrej Karpathy Trains GPT-2 in Pure C Without PyTorch

The llm.c project, available on GitHub, offers a simple approach to implementing GPT-2 training on CPU/fp32 in just around 1,000 lines of code. 

Share

Listen to this story

Former OpenAI researcher Andrej Karpathy has introduced  llm.c, a project aimed at training LLMs systems in pure C without the hefty dependencies of PyTorch and cPython. 

The llm.c project, available on GitHub, offers a simple approach to implementing GPT-2 training on CPU/fp32 in just around 1,000 lines of code. 

“I chose GPT-2 as the first working example because it is the grand-daddy of LLMs, the first time the modern stack was put together,” wrote Karpathy in his GitHub repository.  

One of the key advantages of llm.c is its instant compilation and execution, matching the performance of the PyTorch reference implementation. By allocating memory in a single block at the beginning of training, llm.c maintains a constant memory footprint, enhancing efficiency during data streaming and batch processing.

The core of llm.c lies in manually implementing forward and backward passes for individual layers like layernorm, encoder, matmul, self-attention, gelu, residual, softmax, and cross-entropy loss. This meticulous process ensures accurate pointer arrangements and tensor offsets, crucial for seamless model operation.

“I am curious to learn more about Rust and totally understand the appeal. But I still find C so nice, simple, clean, portable and beautiful, aesthetically. It’s as close as you want to get to direct communion with the machine,” wrote Karpathy. 

Karpathy’s next endeavor involves porting llm.c to CUDA layer by layer, aiming for efficient performance comparable to PyTorch but without the heavyweight dependencies. This transition to CUDA opens doors for lowering precision from fp32 to fp16/below and supporting modern architectures like llama 2, Mistral, Gemma, and more.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.