Last updated July 24, 2024
In AI Breakthroughs

Now You Can Run Llama 3.1 405B on Your Computer Using Peer-to-Peer Network

Nidum.AI plans to use 2000+ Apple computers to run Llama 3.1 on P2P network.

Share

Illustration by Nikhil Kumar

Published on July 24, 2024

by Sagar Sharma

Not everyone can access highly spec’d machines capable of running LLMs locally, which often require substantial computational power and memory.

“GPUs like H100s, which are essential to train and run LLMs efficiently on a large scale, are beyond the budgets of most startups. And running models like Llama 3.1 405B is unthinkable for regular people.

“Renting GPUs and running them on a single cluster or using peer-to-peer connections is one of the easiest ways to do it,” Arjun Reddy, the co-founder of Nidum.AI, told AIM.

We're not trying to regulate you small opensource AI projects. We're regulating these huge data centers and billion dollar corporations. Theres no way you could get millions of computers to compete with us. Peer to peer knows a way. #FreedomToCompute pic.twitter.com/S9nOsadTAF
— Lambda Rick /acc (@benrayfield) April 29, 2024

P2P technology is already used in blockchains, which is a testimony to how secure the network can be. P2P technology came into the limelight for the first time in 1999, when Napster used P2P technology to decentralise music, allowing users to download and host music files from their own computers.

Reddy further explained the approach they follow for the P2P technology. It starts with fine-tuning the existing model for specific needs, which is then divided into hundreds of small parts and described to the P2P network.

A layer of encryption is used to safeguard data.

To showcase the flexibility of P2P technology, Reddy is about to host the largest decentralised AI event later this week where hundreds of Apple computers will be used to run Llama 3.1 through the P2P network. The idea is to demonstrate the importance of decentralised networks to run LLMs.

The Promise of Peer-to-Peer Network

P2P networks, popularised by file-sharing systems like BitTorrent, distribute tasks across multiple nodes, each contributing a portion of the overall workload.

Applying this concept to AI, a P2P network could theoretically distribute the training of an LLM across numerous consumer-grade GPUs, making it possible for individuals and smaller organisations to participate in AI development.

A research paper titled ‘A Peer-to-Peer Decentralised Large Language Models’ discusses a provably guaranteed federated learning (FL) algorithm designed for training adversarial deep neural networks, highlighting the potential of decentralised approaches for LLMs.

A study by Šajina Robert et al. explored multi-task peer-to-peer learning using an encoder-only Transformer model. This approach demonstrated that collaborative training in a P2P network could effectively handle multiple NLP tasks, highlighting the versatility of such systems.

Another significant contribution comes from Sree Bhargavi Balija and colleagues, who investigated building communication-efficient asynchronous P2P federated LLMs with blockchain technology. Their work emphasises the importance of minimising communication overhead and ensuring data integrity in decentralised networks.

But There are Challenges…

Despite the promise, significant challenges hinder the practical implementation of P2P networks for LLMs. One major issue is the bandwidth and latency required for efficient training.

Training LLMs involves transferring vast amounts of data between nodes, which can be prohibitively slow on consumer-grade networks. One Reddit user pointed out that even on a 10-gigabit network, the data transfer rates would be insufficient compared to the high-speed interconnects used in dedicated GPU clusters.

Moreover, the synchronisation required for distributed gradient descent, a common optimisation algorithm in training neural networks, adds another layer of complexity.

Traditional training methods rely on tight synchronisation between nodes, which is difficult to achieve in a decentralised setting.

A research paper on the review of synchronous stochastic gradient descent (Sync-SGD) highlights the impact of stragglers and high latency on the efficiency of distributed training.

… And Solutions

Despite these challenges, ongoing efforts exist to make decentralised AI a reality. Projects like Petals and Hivemind are exploring ways to enable distributed inference and training of LLMs.

Petals, for example, aims to facilitate the distributed inference of large models by allowing users to contribute their computational resources in exchange for access to the network’s collective AI capabilities.

Additionally, the concept of federated learning offers a more feasible approach to decentralised AI.

In federated learning, multiple nodes train a model on their local data and periodically share their updates with a central server, which aggregates the updates to improve the global model.

This method preserves data privacy and reduces the need for extensive data transfer between nodes. It could also be a practical solution for decentralised AI, especially in privacy-sensitive applications like medical machine learning.

📣 Want to advertise in AIM? Book here

Sagar Sharma

A software engineer who loves to experiment with new-gen AI. He also happens to love testing hardware and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants.