The release of the Chinese AI model R1 has caused a stir in the US, reminiscent of the shock from the Soviet Union’s launch of the first satellite in 1957. R1 is not only as effective as OpenAI’s model but is also more efficient and cost-effective. Additionally, it is entirely open source.
Initial reactions to R1 range from panic, with Nvidia’s stock dropping, to fatalism, suggesting that China has won the AI race and US export restrictions were pointless. Some even claim it is a psychological maneuver by China to destabilize the US economy. However, a closer look at the technology behind R1 shows that it alone will not end Silicon Valley’s dominance for several reasons.
The technical paper on R1 has been praised by experts, but it doesn’t introduce groundbreaking innovations. The technologies used are already known. R1 is a reasoning model with a large language model at its core. It breaks down tasks into smaller parts, processing them step by step in a “Chain of Thought.” The software then selects the best chain and presents the result as the answer. During training, the software was shown many examples of “good” thought chains to learn which is best.
The efficiency in training and executing the model likely results from prior work where less efficient large models generate heuristics and training data for smaller models. This approach is similar to Meta’s Llama models. This was possible because Deepseek began training when export restrictions on AI chips were still lenient. Many Chinese companies, including Deepseek, stockpiled Nvidia’s powerful A100 chips during this period.
The release of R1 doesn’t mean US export restrictions were ineffective. Such measures take time to show effects. However, the model indicates that sanctions can accelerate technological development by creating a strong incentive to find alternative solutions due to hardware scarcity.
Deepseek’s decision to release the model openly might be politically motivated to weaken OpenAI. The Chinese government hopes rapid AI development will boost productivity. However, China experiences a cycle of openness and control, known as Fang-Shou. Initially suspicious of generative AI, China now embraces it for dynamism. Yet, experts like former OpenAI developer Miles Brundage argue that China won’t allow a “Wild West” of AI for long. The current openness may not last.
Although reasoning models require more computing power than regular language models, Deepseek’s current supply of AI chips seems sufficient to keep R1 online. This is a significant achievement for China’s AI sector, impacting US companies aiming to profit from AI. But this might change. If the model proves useful, demand will rise both domestically and internationally. Decisions will need to be made on whether to use computing resources for training new models or enhancing existing ones.
The lack of powerful AI chips will continue to slow down China’s AI sector. If sanctions persist or intensify, the Chinese AI industry will need a significant technical breakthrough to replicate R1’s success. When or if this will happen remains uncertain.