By accelerating AI compute, the Cerebras WSE eliminates the main impediment to the artificial intelligence innovation, by reducing the time it takes to train models from months to minutes and from weeks to seconds.
Deep Learning has become one of the most important computational workloads in recent years. But researchers have noted that long training time is the fundamental impediment to innovation in artificial intelligence.
During a deep learning model, a multi-stage training loop is created. The time required to train a network depends on the rate at which inputs can be processed through the feedback loop. The faster the inputs can move through the loop, the more inputs can be sent through the loop per unit time, and the faster the model can train.
The only way to reduce training time is to reduce the time it takes for the inputs to travel through the feedback loop. Accelerating calculation is done by increasing the number of computing cores as more cores do more calculations in less time.
This is precisely what California-based startup Cerebras said when it unveiled what is believed to be the largest computer chip with over 400,000 cores. Known as the Wafer Scale Engine, the chip consists of 1.2 trillion transistors and has been created after four years in research and development.
Putting together 400,000 cores on a single wafer is astonishing if you compare it to a very powerful CPU containing only 300 cores. Cerebras Systems has reported that it will be easier for companies to train models much quicker and remove the bottleneck faced in training times. The company has claimed that it is a wafer-scale implementation of a neural network.
The Chip Can Perform Same As A Cluster Of Hundreds of GPUs
For long, graphics processing units (GPUs) have been used for training deep learning models. This is the reason why NVIDIA has become a prominent chip manufacturer in the last five years. Google has also created its TPUs which can be customised for specific deep learning models. How you use GPUs or TPUs to create deep learning systems is to cluster individual chips together. Now, Cerebras chip has 56 times in area when you compare it to NVIDIA largest server GPU (NVIDIA Volta) which has 21 billion transistors. Cerebras says its processor can do the same work as a cluster of hundreds of GPUs, and consuming less power and space in doing so.
Moreover, the chip can allow the training of more complex deep learning models due to the huge amount on chip memory. The company’s CEO also reported that data can move around a chip around 1,000 times faster than it can between two individual chips which are linked. As a result, Wafer Scale Engine can reduce the time for data processing from months to as low as 2 minutes, according to Cerebras.
How Cerebras Overcame The Challenges in Design
Creating the world’s largest chip would come up with its own challenges-cooling the chip is one major challenge. While smaller chips are easier to cool, it is more complex for Wafer Scale Engine. Here, to overcome this a cold plate is attached above the silicon and has vertically mounted water pipes to help it cool. Due to the fact that it needed a special package, the company has designed its own, combining the printed circuit board (PCB) and wafer with the cold plate using a custom connector.
In such a large chip with 400,000 cores, there may be defective cores. The company overcame this issue by designing the chip with redundant processing cores, having anticipated in advance that defective cores could potentially render the chip useless. I/O fabric connecting one core to the next can route around defective cores.
Overview
By accelerating AI compute, the Cerebras WSE eliminates the main impediment to the artificial intelligence innovation, by reducing the time it takes to train models from months to minutes and from weeks to seconds. Thus, the WSE enables data scientists to test hypotheses more quickly and to explore ideas that today are untestable with legacy architectures or are too risky to try.