UHG
Search
Close this search box.

Andrej Karpathy Reproduces GPT-2 in Latest Tutorial

With the GPT-2 recreation, Karpathy believes the team was very close to GPT-3’s 124M model.

Share

Listen to this story

In a marathon video on his YouTube channel, Andrej Karpathy reproduced GPT-2 in just over four hours. The OpenAI co-founder, who left earlier this year, has spent numerous hours creating tutorial videos for his viewers, including how-to videos on decoding models.

In his latest lecture, Karpathy recreated the smallest version of GPT-2, which has 124 million parameters. In the four hour long video, Karpathy broke down the entire process, starting completely from scratch.

“First, we build the GPT-2 network, then we optimise its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations,” he wrote.

https://twitter.com/karpathy/status/1799949853289804266

With the GPT-2 recreation, Karpathy believes the team was very close to GPT-3’s 124M model. “Our “overnight” run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar,” he said.

Karpathy has long been a proponent for democratising knowledge on AI, specifically the science behind LLMs.

Karpathy, who had been responsible for deep learning and computer vision at OpenAI, left the company for a second time in February this year. However, he has been actively involved in the AI community, creating tutorials and breakdown videos on different models, even more so after his departure.

Most recently, he released a project titled llm.c, where users can train LLMs using only C without having to rely on PyTorch and cPython.

Even prior, he had released an hour-long lecture explaining the intricacies of LLMs and how they function. Shortly after his departure from OpenAI, he had also released a tutorial on understanding tokenisation, also taking the opportunity to analyse Google’s Gemma tokeniser following its launch.

📣 Want to advertise in AIM? Book here

Picture of Donna Eva

Donna Eva

Donna is a technology journalist at AIM, hoping to explore AI and its implications in local communities, as well as its intersections with the space, defence, education and civil sectors.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.