Deepseek's Rise: Innovation Amidst US Tech Sanctions in China's AI Landscape

Deepseek has achieved more than its US competitors expected. This could be a kind of balancing justice, especially beneficial for researchers and developers with limited resources, particularly in the Global South. The recent buzz around the Chinese large language model Deepseek has surprised many in the tech world.

The success of Deepseek, which caused shares of Nvidia and other American AI specialists to plummet, is notable given the restrictions Chinese startups face due to increasing US export controls on advanced chips. These measures were intended to weaken China’s AI capabilities, but instead, they seem to have driven startups like Deepseek to innovate, focusing on efficiency, resource pooling, and collaboration with local firms and researchers.

To develop R1, Deepseek’s reasoning system, the company had to redesign its training process to reduce the load on existing GPUs. The company claims this involves a variant released by Nvidia specifically for the Chinese market. Its performance is roughly half of what US top products can achieve. Despite this, Deepseek R1 is praised for its ability to handle complex reasoning tasks, especially in mathematics and programming. The model uses a “Chain of Thought” approach, similar to ChatGPT, allowing it to solve problems through step-by-step query processing.

Dimitris Papailiopoulos, a lead scientist at Microsoft’s AI Frontiers research lab, noted that R1’s seemingly simple technique is surprising. Deepseek aims for precise answers rather than detailing every logical step, significantly reducing computation time while maintaining high effectiveness. Deepseek has also released six smaller variants of R1, light enough to run locally on laptops, with one reportedly outperforming OpenAI’s o1-mini in certain benchmarks.

Deepseek, founded in July 2023 in Hangzhou by Liang Wenfeng, is relatively unknown and mysterious. Liang, a graduate of Zhejiang University with a background in information and electrical engineering, had previously founded a hedge fund named High-Flyer in 2015. Like Sam Altman of OpenAI, Liang aims to build an Artificial General Intelligence (AGI), a form of AI that can match or surpass humans in various tasks.

Training large language models (LLMs) requires a team of highly skilled scientists and significant computing power. Only top players usually engage in developing foundational models like ChatGPT, which are resource-intensive. The US export controls on necessary high-end chips complicate this further. High-Flyer’s decision to venture into AI appears directly related to these restrictions.

Before the anticipated sanctions, Liang acquired a significant stockpile of Nvidia A100 chips, a type now banned from export to China. The Chinese media company 36Kr estimates that the company has over 10,000 units of these GPUs in stock. Deepseek’s creation aimed to utilize this existing treasure, even if the chips have become outdated.

Technological giants like Alibaba and ByteDance, along with a few startups backed by wealthy investors, currently dominate the Chinese AI sector, making it challenging for small and medium-sized enterprises to compete. A company like Deepseek, which seems to have no explicit plans to raise investor funds, is rare. Wang, a former Deepseek employee, mentioned that he had access to large computing resources and the freedom to experiment during his time at the company.

In an interview with Chinese media company 36Kr, Liang stated that besides chip sanctions, Chinese companies face the challenge of less efficient AI development processes. Most Chinese companies require double the computing power to achieve the same results, and data efficiency gaps can mean needing up to four times more computing power. Deepseek has apparently found ways to reduce memory consumption and accelerate computation without significantly compromising accuracy.

Chinese companies not only focus on efficiency but also increasingly embrace open-source principles. Alibaba Cloud has released over 100 new open-source AI models supporting 29 languages for various applications, including programming and mathematics. Startups like Minimax and 01.AI have also released their models as open source. According to a white paper published last year by the China Academy of Information and Communications Technology, 36% of large AI language models worldwide come from China, making it the second-largest provider after the US.

The US export controls have essentially forced Chinese companies to manage their limited computing resources more efficiently. This situation might lead to a strong consolidation related to the lack of computing power. Despite Deepseek’s successes, this consolidation may have already begun. Two weeks ago, Alibaba Cloud announced a partnership with Beijing-based startup 01.AI, founded by Kai-Fu Lee, to merge research teams and establish an “industrial large model laboratory.”

Deepseek’s Rise: Innovation Amidst US Tech Sanctions in China’s AI Landscape

Related Posts