On January 20, 2025, the Chinese company DeepSeek launched its new large language model DeepSeek-R1, which caused excitement in scientific circles as an affordable and open alternative to other advanced models such as OpenAI o1. These new "logical" models generate answers step by step, resembling human thinking, which makes them more capable in solving scientific problems, writes Nature.com.
DeepSeek-R1's achievements
Initial tests show that the model has results comparable to those of o1, especially in areas such as chemistry, mathematics and programming. For example, DeepSeek-R1 achieved an impressive 97.3% on the MATH-500 math problem set created by the University of California, Berkeley, and outperformed 96.3% of human participants in the Codeforces programming competition.
"This is incredible and completely unexpected," said Elvis Saravia, an artificial intelligence researcher and co-founder of DAIR.AI, based in the UK.
One of the key features of DeepSeek-R1 is its "openness". The model is published under an MIT license, which allows for free use and development of the algorithm, although the training data is not provided. This sets it apart from competitors like OpenAI's o1 and o3, which are "black boxes," says Dr. Mario Krenn, head of the Artificial Scientist Lab at the Max Planck Institute in Germany.
DeepSeek-R1 is also significantly more affordable. The company offers an interface for using the model that is about 30 times cheaper than o1. In addition, DeepSeek has created "distilled" versions of R1 that require less computing power, allowing scientists with limited resources to work with the model.
According to Mario Krenn's calculations, an experiment that would cost over £300 with o1 costs less than $10 with R1. "That's a dramatic difference that will certainly affect its future adoption," he adds.
A smart approach to limited resources
DeepSeek-R1 was created in conditions of limited access to the best AI processing chips, imposed by US export restrictions. However, DeepSeek manages to compensate with an innovative algorithmic approach.
One of the main methods used in training the model is the so-called "chain of thought", which helps it solve more complex tasks, sometimes going back and reevaluating its approach. To do this, the company uses a reinforcement learning method, in which the model is rewarded for correct answers and clearly explained thinking steps.
In addition, the team used "expert mixes" (mixture-of-experts) - an architecture that activates only the relevant parts of the model for each task, which significantly reduces training costs.
DeepSeek-R1 represents not only a technological but also a strategic challenge to other players in the industry, including Nvidia - the main chip supplier to American competitors such as OpenAI and Meta. DeepSeek's success shows that high-performance AI models can be created even with limited hardware resources and without the most advanced chips, which calls into question the need for expensive infrastructure. This reduces the dependence of Chinese companies on Nvidia and undermines their dominant position in the global AI hardware market. While Nvidia continues to supply the American giants with powerful GPUs, the success of DeepSeek demonstrates that smart algorithms and resource efficiency can shift the focus from pure computing power to software innovation.
The Story of DeepSeek
DeepSeek is a subsidiary of High-Flyer, a company known as a highly successful quant firm. According to Han Xiao, an artificial intelligence researcher, the company was founded by extremely smart professionals with deep knowledge of mathematics and has been using powerful GPUs for trading and mining cryptocurrencies for years. "DeepSeek is their side project, with which they are trying to optimize the use of these GPUs," shares Han Xiao in X.
Scientific and Practical Applications
Although R1 lags slightly behind o1 in evaluating research ideas, it has shown better results in quantum optics calculations, says Kren. "That's pretty impressive," he adds.
Furthermore, the openness of the model allows scientists to study its "logic", which improves understanding and interpretation of the processes.
DeepSeek-R1 is part of a rapidly growing wave of Chinese language models that are closing the gap with leading US developments. The model’s success highlights the importance of efficient use of resources, while also highlighting the need for international cooperation in the field of artificial intelligence.
Disadvantages
Despite its innovative features and impressive results, DeepSeek-R1 also has some built-in limitations that highlight the influence of the context in which it was created. The model avoids answering questions that China considers sensitive, including topics such as Taiwan, the situation with the Uyghurs, the Tiananmen Square events, or any criticism of President Xi Jinping and other former leaders of the Chinese Communist Party. In such cases, R1 either declares the topic outside its scope or deletes the answer after it has started it. This raises doubts about the model's ability to be a truly open and universal tool, while also highlighting the influence of national policies on the development of artificial intelligence. Such restrictions could lead to distrust outside China, especially in academic and research circles that demand transparency, neutrality, and freedom of expression when working on sensitive historical and political topics.
Source: money.bg