The Chinese AI model Deepseek V3 is said to compete with the best models from OpenAI and Google in numerous benchmarks. This model is open source and claims to outperform other freely available models in terms of performance. The development and training of Deepseek V3, according to the provider, cost significantly less than other large models. However, Deepseek V3, like other models from China, censors content and responses. Although this can be bypassed with tricks, one needs to know when it is necessary, meaning what was a censored response that needs to be circumvented.
Fundamentally, Deepseek V3 is a mixture-of-experts model and comes with a chatbot that can already be tried out. Additionally, Deepseek V3 can be integrated via API. Mixture-of-experts means that only specific experts suitable for the task are addressed when responding. The model is based on 671 billion parameters. According to the provider on Github, only 2.788 million GPU hours on Nvidia’s H800 GPUs were needed for training. This includes fine-tuning as well as additional rounds of reinforcement learning, where a model learns through confirmation. The predecessor was called Deepseek R1 and is specialized in reasoning. Similar to OpenAI’s models, the AI responds in loops, reconsidering its own answers.
According to AI expert Andrej Karpathy, Meta’s free model Llama 3 required about 30.8 million GPU hours for 405 billion parameters. He describes the budget for Deepseek V3, around six million US dollars, as a “joke.” This amount is not confirmed by Deepseek. It is also unclear where the training data comes from. There are speculations that it is based on responses from ChatGPT, as Deepseek V3 occasionally claims to be ChatGPT itself.
Deepseek V3 performs nearly as well or even better than other free models in numerous benchmarks. In the math test Math-500, Deepseek V3 achieves 90.2 percent, while GPT-4 scores only 74.6 percent. Unsurprisingly, the Chinese model is particularly good at speaking and understanding Chinese. However, its abilities in other languages, such as German, are unknown. The selection of training material is crucial for language skills. If a model is primarily trained with English texts, as was initially the case with major providers, it struggles with other languages and possibly other cultures.
Deepseek V3 has published its benchmark results on its website, so these are not independently verified figures. Moreover, such tests are not very indicative of how useful and helpful models are in everyday life.
Additionally problematic is that Deepseek V3 answers some questions in line with the Chinese government’s views. For instance, there is no criticism of the government. Events such as those on Tian’anmen Square are also omitted by the AI. In 1989, protests by a democracy movement were violently ended there. With common AI tricks, one can get the model to write about the massacre. However, if someone does not know that something is being omitted, applying tricks is difficult because they are unaware that something is missing.
Nonetheless, Yann LeCun, AI expert and head at Meta, also says that Deepseek V3 is “excellent.”