Listen to this story
|
Every company today claims to be using GPT-3-like models for their generative AI play. This includes Jasper, Notion, Regie, Frase and others. As a result, there are questions about what differentiates these ‘all-in-one’ companies apart from design, marketing, and use cases. But, as Chris Frantz, co-founder of Loops, iterates, this also leads one to believe “there is almost no moat in generative AI”.
To understand this better, let’s look at the recent updates from GPT-3 creator OpenAI, where it launched the third version of the Davinci model. This new update calls for more computing resources leading to higher costs per API call and lesser speed than other models.
This also means that the companies need to look beyond using generative AI tools, and focus on enhancing their computing capabilities, particularly in terms of cost as well as optimisation. This explains why Jasper—the AI-content platform— recently announced that it would partner with the American AI startup Cerebras Systems. The company will use Cerebras’ Andromeda AI supercomputer to train GPT networks, creating outputs of varying levels of end-user complexity. Additionally, the AI supercomputer is also said to improve the contextual accuracy of the generative model while providing personalised content across different users.
Regarding the partnership, venture capitalist Nathan Benaich says it looks like Jasper may move forward to decrease its reliance on OpenAI’s API by building its own models and training them on Cerebras, going beyond training GPT-3 on Cerebras systems.
The two AI platforms—Jasper and Notion—have taken different approaches to AI integration. While Jasper is using the AI-accelerating computing power of Cerebras, Notion is supported by Google Cloud, which will use the Cloud TPU for training the API. Although Notion has not confirmed it yet, it is widely believed that the kind of output it generates suggests that it is using OpenAI API’s GPT-3.
Therefore, in the era of GPT-3 companies, Jasper will look to set a new benchmark for what can be the moat in generative AI companies. The API used and the means taken for training the model will be the defining factor separating the companies. This directly supports that the present and future of software are cloud services and supercomputing services. It also emphasises on the effective use of hardware for generative AI play.
Read: India’s Answer to Moore’s Law Death
The following are some of the approaches that can help you understand hardware side of things when leveraging generative AI tools:
CS-2 versus-Cloud-versus GPU
The Andromeda AI supercomputer is built by linking 16 Cerebras CS-2 systems powered by the largest AI chip, the Wafer Scale Engine (WSE) 2. Cerebras’ ‘weight streaming’ technology provides immense flexibility, allowing for independent scaling of the model size and training speed. In addition, the cluster of CS-2 machines has training and inference acceleration that can support even trillion parameter models. Cerebras also claims that their CS-2 machines can form a cluster of up to 192 systems with near-linear performance scaling to speed up training.
Further, a single CS-2 system can clock a compute performance of tens to hundreds of graphics processing units (GPU) and deliver output that would normally take days and weeks on general-purpose processors to generate in a fraction of the time.
In contrast, the Cloud uses custom silicon chips to accelerate AI workloads. For example, Google Cloud employs its in-house chip, the Tensor Processing Unit (TPU), to train large, complex neural networks using Google’s own TensorFlow software.
Cloud TPUs are ‘virtual machines’ that offload networking processors onto the hardware. The model parameters are kept in on-chip, high-bandwidth memory. The TensorFlow server fetches input training data and pre-processes it before streaming it into an ‘infeed’ queue on the Cloud TPU hardware.
Additionally, Cloud has also been increasing its GPU offerings. For instance, the latest AWS P4d and G4 instances are powered by NVIDIA A100 Tensor Core GPUs. Earlier this year, Microsoft Azure also announced the adoption of NVIDIA’s Quantum-2 to power next-generation HPC needs. The cloud instances are widely used as they come fully configured for deep learning with accelerated libraries like CUDA, cuDNN, TensorFlow, and other well-known deep learning frameworks pre-installed.
Andrew Feldman, CEO and co-founder of Cerebras Systems, explained that the variable latency between large numbers of GPUs in traditional cloud providers creates difficult, time-consuming problems when distributing a large AI model among GPUs, and there are “large swings in time to train.”
According to ZDNET, the ‘pay-per-model’ AI cloud services of Cerebras’ system are $2,500 for training a GPT-3 model with 1.3 billion parameters in 10 hours to $2.5 million for training one with 70 billion parameters in 85 days, costing on average half of what customers would pay to rent cloud capacity or lease machines for years to do the task.
The same CS-2 clusters are also eight times faster to train than the training clusters of NVIDIA A100 machines in the Cloud. Whereas, according to MLPerf, when similar batches are run on TPUs and GPUs with the same number of chips, they almost exhibit the same training performance in SSD and Transformer benchmarks.
But, as Mahmoud Khairy points out in his blog, the performance depends on various metrics beyond the cost and training speed, and, hence, the answer to which approach is best also depends on the kind of computation that needs to be done. At the same time, the Cerebras CS-2 system is emerging as one of the most powerful tools in training vast neural networks.
Read: This Large Language Model Predicts COVID Variants
The AI supercomputing service provider is also extending itself to Cloud by partnering with Cirrascale cloud services to democratise cloud services and give its users the ability to train the GPT model at much cheaper costs than existing cloud providers and with only a few lines of code.