UHG
Search
Close this search box.

Thinking Beyond Generative AI, One Token At A Time

VCs have been banking on generative AI companies, but what is their real moat?

Share

Listen to this story

Every company today claims to be using GPT-3-like models for their generative AI play. This includes Jasper, Notion, Regie, Frase and others. As a result, there are questions about what differentiates these ‘all-in-one’ companies apart from design, marketing, and use cases. But, as Chris Frantz, co-founder of Loops, iterates, this also leads one to believe “there is almost no moat in generative AI”.

To understand this better, let’s look at the recent updates from GPT-3 creator OpenAI, where it launched the third version of the Davinci model. This new update calls for more computing resources leading to higher costs per API call and lesser speed than other models.

This also means that the companies need to look beyond using generative AI tools, and focus on enhancing their computing capabilities, particularly in terms of cost as well as optimisation. This explains why Jasper—the AI-content platform— recently announced that it would partner with the American AI startup Cerebras Systems. The company will use Cerebras’ Andromeda AI supercomputer to train GPT networks, creating outputs of varying levels of end-user complexity. Additionally, the AI supercomputer is also said to improve the contextual accuracy of the generative model while providing personalised content across different users. 

Regarding the partnership, venture capitalist Nathan Benaich says it looks like Jasper may move forward to decrease its reliance on OpenAI’s API by building its own models and training them on Cerebras, going beyond training GPT-3 on Cerebras systems. 

The two AI platforms—Jasper and Notion—have taken different approaches to AI integration. While Jasper is using the AI-accelerating computing power of Cerebras, Notion is supported by Google Cloud, which will use the Cloud TPU for training the API. Although Notion has not confirmed it yet, it is widely believed that the kind of output it generates suggests that it is using OpenAI API’s GPT-3.  

Therefore, in the era of GPT-3 companies, Jasper will look to set a new benchmark for what can be the moat in generative AI companies. The API used and the means taken for training the model will be the defining factor separating the companies. This directly supports that the present and future of software are cloud services and supercomputing services. It also emphasises on the effective use of hardware for generative AI play.

Read: India’s Answer to Moore’s Law Death

The following are some of the approaches that can help you understand hardware side of things when leveraging generative AI tools:

CS-2 versus-Cloud-versus GPU

The Andromeda AI supercomputer is built by linking 16 Cerebras CS-2 systems powered by the largest AI chip, the Wafer Scale Engine (WSE) 2. Cerebras’ ‘weight streaming’ technology provides immense flexibility, allowing for independent scaling of the model size and training speed. In addition, the cluster of CS-2 machines has training and inference acceleration that can support even trillion parameter models. Cerebras also claims that their CS-2 machines can form a cluster of up to 192 systems with near-linear performance scaling to speed up training. 

Further, a single CS-2 system can clock a compute performance of tens to hundreds of graphics processing units (GPU) and deliver output that would normally take days and weeks on general-purpose processors to generate in a fraction of the time. 

In contrast, the Cloud uses custom silicon chips to accelerate AI workloads. For example, Google Cloud employs its in-house chip, the Tensor Processing Unit (TPU), to train large, complex neural networks using Google’s own TensorFlow software. 

Cloud TPUs are ‘virtual machines’ that offload networking processors onto the hardware. The model parameters are kept in on-chip, high-bandwidth memory. The TensorFlow server fetches input training data and pre-processes it before streaming it into an ‘infeed’ queue on the Cloud TPU hardware. 

Additionally, Cloud has also been increasing its GPU offerings. For instance, the latest AWS P4d and G4 instances are powered by NVIDIA A100 Tensor Core GPUs. Earlier this year, Microsoft Azure also announced the adoption of NVIDIA’s Quantum-2 to power next-generation HPC needs. The cloud instances are widely used as they come fully configured for deep learning with accelerated libraries like CUDA, cuDNN, TensorFlow, and other well-known deep learning frameworks pre-installed. 

Andrew Feldman, CEO and co-founder of Cerebras Systems, explained that the variable latency between large numbers of GPUs in traditional cloud providers creates difficult, time-consuming problems when distributing a large AI model among GPUs, and there are “large swings in time to train.”

According to ZDNET, the ‘pay-per-model’ AI cloud services of Cerebras’ system are $2,500 for training a GPT-3 model with 1.3 billion parameters in 10 hours to $2.5 million for training one with 70 billion parameters in 85 days, costing on average half of what customers would pay to rent cloud capacity or lease machines for years to do the task. 

The same CS-2 clusters are also eight times faster to train than the training clusters of NVIDIA A100 machines in the Cloud. Whereas, according to MLPerf, when similar batches are run on TPUs and GPUs with the same number of chips, they almost exhibit the same training performance in SSD and Transformer benchmarks. 

But, as Mahmoud Khairy points out in his blog, the performance depends on various metrics beyond the cost and training speed, and, hence, the answer to which approach is best also depends on the kind of computation that needs to be done. At the same time, the Cerebras CS-2 system is emerging as one of the most powerful tools in training vast neural networks. 

Read: This Large Language Model Predicts COVID Variants

The AI supercomputing service provider is also extending itself to Cloud by partnering with Cirrascale cloud services to democratise cloud services and give its users the ability to train the GPT model at much cheaper costs than existing cloud providers and with only a few lines of code.

📣 Want to advertise in AIM? Book here

Picture of Ayush Jain

Ayush Jain

Ayush is interested in knowing how technology shapes and defines our culture, and our understanding of the world. He believes in exploring reality at the intersections of technology and art, science, and politics.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.