Listen to this story
|
In a marathon video on his YouTube channel, Andrej Karpathy reproduced GPT-2 in just over four hours. The OpenAI co-founder, who left earlier this year, has spent numerous hours creating tutorial videos for his viewers, including how-to videos on decoding models.
In his latest lecture, Karpathy recreated the smallest version of GPT-2, which has 124 million parameters. In the four hour long video, Karpathy broke down the entire process, starting completely from scratch.
“First, we build the GPT-2 network, then we optimise its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations,” he wrote.
With the GPT-2 recreation, Karpathy believes the team was very close to GPT-3’s 124M model. “Our “overnight” run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar,” he said.
Karpathy has long been a proponent for democratising knowledge on AI, specifically the science behind LLMs.
Karpathy, who had been responsible for deep learning and computer vision at OpenAI, left the company for a second time in February this year. However, he has been actively involved in the AI community, creating tutorials and breakdown videos on different models, even more so after his departure.
Most recently, he released a project titled llm.c, where users can train LLMs using only C without having to rely on PyTorch and cPython.
Even prior, he had released an hour-long lecture explaining the intricacies of LLMs and how they function. Shortly after his departure from OpenAI, he had also released a tutorial on understanding tokenisation, also taking the opportunity to analyse Google’s Gemma tokeniser following its launch.