DeepSeek: A New Efficient AI Challenger in the Language Model Arena

DeepSeek is causing a stir in the AI world. The new language model from the Chinese company is said to be competitive with OpenAI’s Reasoning Model o1 and was apparently developed for a fraction of the cost. While US companies rely on immense computing power, DeepSeek aims to score with efficient architecture. However, the open-source approach has its limits, and political censorship is a concern.

DeepSeek is an open-source chatbot that is currently making waves. It is said to be better than the current market leader, ChatGPT from OpenAI, and is also open, meaning you can download it and run it on your own hardware. It was trained much cheaper than the large US language models. As a result, the stock prices of US tech companies have fallen significantly, as the previously accepted story that more computing power is needed for AI doesn’t seem to hold true anymore.

DeepSeek is a Chinese AI startup from Hangzhou, often referred to as the Silicon Valley of China. The company is funded by the Chinese hedge fund Highflyer. DeepSeek works on open AI models and also sells access to the models. Unlike OpenAI, DeepSeek makes its models available for download.

The first AI model from DeepSeek, named DeepSeek, was released in November 2023, but the hype started with the DeepSeek R1 model. There are two important types of large language models: normal models that answer questions quickly and directly, and reasoning models that discuss internally before answering, suitable for complex tasks like math.

DeepSeek’s normal model is called DeepSeek V3, and the reasoning model is DeepSeek R1. You can use these models on chat.deepseek.com by creating an account and using them for free. Unlike ChatGPT, you don’t need to provide a phone number; an email address is sufficient. You can also download and run the models locally under the MIT license.

However, to achieve the quality available on DeepSeek.com, you need a lot of hardware, like 16 Nvidia H100 with 80 GB each. Both DeepSeek V3 and R1 have over 600 billion parameters, requiring significant storage. There are distilled versions for consumer hardware, but they don’t match the quality of the large models.

DeepSeek is very good as a pure text chatbot. It reliably does what you want, unlike ChatGPT or Claude, which sometimes require persuasion. For example, when correcting a transcript, DeepSeek provided the full text on the first try, while ChatGPT didn’t, even after multiple attempts.

DeepSeek’s responses are comparable to the competition’s, though there are hallucinations in some cases. The speed of DeepSeek is okay and similar to competitors. The V3 model sometimes had issues with server overloads, but DeepSeek claims they were hacked or overloaded.

DeepSeek performs well in benchmarks, often better than GPT-4o and Claude 3.5. However, it has issues with questions about China and its political system, as it follows the government line. For logical tasks, DeepSeek V3 struggles, while the reasoning model R1 should handle them correctly but sometimes gets stuck.

The biggest advantage of the Perplexity integration is that the R1 model works reliably, without server overloads. DeepSeek is efficient, using less hardware than US competitors, thanks to its Mixture-of-Experts architecture. It activates only the necessary parameters, saving resources.

DeepSeek shows that AI can improve through clever optimizations, not just brute force. This has impacted the stock prices of US tech companies. DeepSeek offers models that can match or surpass commercial systems from OpenAI, Google, or Anthropic, making access to powerful AI more affordable.

However, discussing politics, especially about China, with DeepSeek is not advisable. Data from DeepSeek.com or the app is stored on Chinese servers. But as DeepSeek is open-source, European companies might offer hosting on European servers soon.

In conclusion, DeepSeek provides competitive AI models that can be run locally, offering an alternative to commercial systems. It challenges the notion that advanced AI requires immense computing resources, showing that efficiency and clever design can achieve similar results.

Related