Meta has reportedly created a crisis team. Nvidia’s stock price is dropping. OpenAI is under pressure. Deepseek from China offers AI models and a chatbot that can compete with popular models from Silicon Valley. The training was faster and cheaper. Access to the model is also more affordable for customers.
The models were released weeks ago, but Deepseek is now the most downloaded app in the App Store and gaining attention. This could be because major investor Marc Andreessen from Silicon Valley recently called it one of the most impressive breakthroughs he has ever seen.
Deepseek released the R1 and V3 models. V3 reportedly surpasses the performance of GPT-4 and Anthropics Claude 3.5 in some benchmarks, even though development costs were a fraction. Specifically, training costs were 5.6 million USD with 2.78 million GPU hours, as stated on their website. Meta’s Llama, with about 400 parameters, requires approximately eleven times as many GPU hours. Deepseek R1 is a reasoning model that can compete with OpenAI’s o1. Both models are available under an MIT license.
Despite the cost-effective development and use of Deepseek models, it is unclear how the provider managed to develop the models so efficiently. One issue is that Deepseek should not have access to powerful chips for AI training due to US trade restrictions. However, there are reports that the founder bought enough Nvidia A100 GPUs years ago for his hobby, which he can now use.
The Financial Times wrote a brief profile on Liang Wenfeng, a former hedge fund manager with a passion for AI. He reportedly founded Deepseek in May 2023. According to the article, the company is entirely cross-subsidized. Wenfeng claimed he has no economic interests in his AI models because basic research yields low returns. Instead, he wants to impact the Chinese economy.
Deepseek is causing a stir, especially in the US. The stock values of all companies linked to AI are fluctuating. If the models are truly so powerful and require less power, there may be no need for 500 billion USD data centers like Project Stargate. The open-source strategy also allows the models to be replicated.
Several AI experts have already commented. Yann LeCun from Meta wrote that Deepseek V3 is “excellent.” Microsoft CEO Satya Nadella warned at the World Economic Forum in Davos that the Chinese development must be taken “very, very seriously.”
Last year, when Deepseek released the first versions of the models, Jim Fan from Nvidia wrote that open-source models could put enormous pressure on commercial companies. He stated, “Resource constraints are beautiful. The survival instinct in a ruthless AI competition environment is a top driver for breakthroughs.”
Perplexity CEO Aravind Srinivas speculated, “Necessity is the mother of invention. Because they had to find workarounds, they ended up building something much more efficient.”
However, there are also suspicions that Deepseek is not entirely truthful about the model development. CNBC reported that Chetan Puttagunta from venture capital firm Benchmark said Deepseek might have used model distillation. This process transfers the knowledge of a large model to a smaller one, which other AI companies like Meta are also working on.
The Deepseek chatbot sometimes claims to be ChatGPT, suggesting it was trained on this chatbot. A problem is that Deepseek models answer some questions in line with the Chinese government. For example, events on Tiananmen Square are omitted, where democracy movement protests were violently ended in 1989. With typical AI tricks, the model can be made to write about the massacre. However, if someone is unaware of omissions, they cannot easily apply tricks because they do not know something is missing.