So China can do AI too?Date:
01/30/2025Tag: #psd #ai So China can do AI too?
There’s no doubt what is the biggest story of the last week in the tech industry. Chinese company DeepSeek launched an AI app that, on the face of it at least, did a very similar job to Silicon Valley’s best efforts at a fraction of the cost and power. It definitely placed the cat among the pigeons, as its rivals scrambled to see if the claimed performance was true and to work out how the Chinese company had accomplished it from the White Paper it published. The US companies, like Google, Microsoft, Meta and OpenAI have all launched similar AI models, and presumably were counting on revenue from sales to partly offset the cost of upgrading their data centres to use the latest and greatest Nvidia GPUs.
The US had placed an embargo on selling the latest AI chips, mainly from Nvidia, to China for the past few years. One of the main hurdles placed by Nvidia on chips developed for China is in the speed of communications. Each AI model uses extensive amounts of data to train, and getting that data around the system to be process has historically caused the industry lots of problems. The US administration thought that crippling the communication circuits in the GPUs could limit China’s AI progress. However, Deepseek took an innovative approach by using a parallel pipelining algorithm, called DualPipe, to make the transportation of data much more efficient.
The White Paper explained DualPipe by saying “For the training framework, we design the DualPipe algorithm for efficient pipeline parallelism, which has fewer pipeline bubbles and hides most of the communication during training through computation-communication overlap. This overlap ensures that, as the model further scales up, as long as we maintain a constant computation-to-communication ratio, we can still employ fine-grained experts across nodes while achieving a near-zero all-to-all communication overhead.”
The company also used clever techniques to reduce the amount of data that needs to be processed during each training run, meaning that there was less data needed to be transported around the system in the first place. That allowed the export-legal Nvidia GPUs to overcome the deliberate handicaps in its design.
The combination of lower power, and fewer calculations will allow DeepSeek to charge a much cheaper rate, cutting the potential profits that had been expected by its more expensive and power hungry rivals, such as OpenAI. Ironically, OpenAI had barely just announced it was helping to lead a new $500 bn data centre initiative, along with Oracle, SoftBank and others, presumably using Nvidia’s finest and hugely expensive GPUs.
No doubt, for our industry at least, as well as the world’s power generators, the news was pretty welcome. The industry is facing a challenge to power the latest GPUs when they become widely available, especially in data centre setting where there will be tens of thousands of them all needing powered. So this news may give power engineers a small respite. But as the quote above mentions, the techniques should be easy enough to scale, allowing Deepseek implementations to tackle much larger problems..and of course, nobody in the history of computing has ever uttered the words, “no thanks, we can do without the extra power”. |