Can DeepSeek continue to be popular?

avatar
PANews
01-27
This article is machine translated
Show original
Here is the English translation of the text, with the specified terms retained as is:

Author: Yu Yan, reporter of The Paper

·A headhunter responsible for recruiting high-end technology talents in the large model field told The Paper that the hiring logic of DeepSeek is not much different from that of other companies in the large model field. The core label for talents is "young and high-potential", that is, born around 1998, with work experience preferably not exceeding five years, "smart, science and engineering background, young, and less experienced."

·In the view of industry insiders, compared to other domestic large model startups, DeepSeek is fortunate as it does not face financing pressure and does not need to prove to investors or balance the technical iteration of the model and the optimization of product applications. However, as a commercial company, after a huge investment, it will eventually face the pressure and challenges currently faced by other model companies.

Which company will be the hottest in China's large model circle in 2024? Hangzhou Deepseek Artificial Intelligence Basic Technology Research Co., Ltd. (hereinafter referred to as DeepSeek) must be a strong contender. If we say that as the initiator of the large model price war in the middle of last year, DeepSeek first came into the public eye, by the end of the year and the beginning of this year, after successively releasing the open-source model DeepSeek-V3 and the inference model DeepSeek-R1, DeepSeek has completely ignited the public opinion field of the large model circle. On the one hand, people are surprised by its high cost-performance training cost (it is said that DeepSeek-V3 only cost $5.576 million for training), and on the other hand, they applaud its model open-sourcing and public technical report. The release of DeepSeek-R1 has excited many scientists, developers, and users, and some even believe that DeepSeek is a strong competitor to OpenAI's o1 inference model.

How can this low-key company achieve high-performance large models at an extremely low training cost? What did it do right to become so popular today? What challenges will it face in the future to continue to ride the wind and waves in the "model circle"?

Algorithm innovation leads to a significant reduction in computing power cost

"DeepSeek invested early and accumulated a lot, and has its own characteristics in algorithms," said a senior executive of a star large model startup company in China, who believes that the core advantage of DeepSeek's ability to break out of the circle is due to innovation in algorithms. "Chinese companies are more focused on cost savings in computing power costs because of the lack of computing power."

According to the information released by DeepSeek on DeepSeek-R1, it made large-scale use of reinforcement learning technology in the post-training stage, which greatly improved the model's inference capability even with very little annotated data. In tasks such as mathematics, coding, and natural language reasoning, its performance is on par with the official version of OpenAI's o1.

Can DeepSeek continue to be popular?

DeepSeek-R1 API Pricing

DeepSeek founder Liang Wenfeng has repeatedly emphasized that DeepSeek is committed to opening up a differentiated technical route, rather than copying OpenAI's model, and that DeepSeek must come up with more effective ways to train its models.

"They used a series of engineering tricks to optimize the model architecture, such as innovatively using model mixing methods, with the fundamental purpose of reducing costs through engineering to make it profitable," a veteran in the technology industry told The Paper.

According to the information disclosed by DeepSeek, it has made significant progress in MLA (Multi-head Latent Attention) and its self-developed DeepSeekMOE (Mixture-of-Experts) structure, which reduce training computing resources and make DeepSeek models more cost-effective, as well as improve training efficiency. According to data from the research institute Epoch AI, DeepSeek's latest models are very efficient.

In terms of data, unlike OpenAI's "massive data feeding" approach, DeepSeek uses algorithms to summarize and classify the data, and after selective processing, it is fed into the large model, improving training efficiency and reducing DeepSeek's costs. The emergence of DeepSeek-V3 has achieved a balance between high performance and low cost, providing new possibilities for the development of large models.

"In the future, perhaps we won't need a super-large GPU cluster anymore," said Andrej Karpathy, a founding member of OpenAI, after the release of DeepSeek's high-cost-effective models.

Liu Zhiyuan, a tenured associate professor in the Department of Computer Science at Tsinghua University, told The Paper that DeepSeek's breakout proves our competitive advantage lies in the ultimate efficient utilization of limited resources to achieve more with less. The release of R1 shows that the gap in AI strength between China and the United States has narrowed significantly. The Economist also reported in its latest issue that "DeepSeek is simultaneously changing the technology industry with its low-cost training and innovative model design."

Demis Hassabis, CEO and co-founder of Google DeepMind, acknowledged that the team's achievements are indeed impressive, while noting that it is not yet fully clear the extent to which DeepSeek depends on Western systems in terms of training data and open-source models. On the one hand, he recognizes China's very strong engineering capabilities and scalability, but on the other hand, he also points out that the West is still leading and needs to consider how to maintain the leading position of its frontier models.

Years of focus and accumulation

DeepSeek's achievements are not the result of a single day's work, but the result of years of "incubation" and long-term planning. Liang Wenfeng is also the founder of the top quantitative private equity firm Phantasy Quantitative. DeepSeek is believed to have fully utilized the capital, data, and computing resources accumulated by Phantasy Quantitative.

Liang Wenfeng graduated from Zhejiang University with a bachelor's and master's degree in Information and Electronic Engineering. Since 2008, he has led a team to explore fully automated quantitative trading using machine learning and other technologies. In 2015, Phantasy Quantitative was established, and the first AI model was launched the following year, with the first trading position generated by deep learning put into execution. In 2018, the company established AI as its main development direction. In 2020, Phantasy Quantitative invested over 100 million yuan to build the "Firefly One" AI supercomputer, which is said to be able to match the computing power of 40,000 personal computers. In 2021, Phantasy Quantitative invested 1 billion yuan to build "Firefly Two", which is "equipped with 10,000 A100 GPU chips". At that time, there were no more than 5 companies in China with more than 10,000 GPUs, and apart from Phantasy Quantitative, the other 4 companies were internet giants.

In July 2023, DeepSeek was officially established and entered the field of general artificial intelligence, and has never raised external funding to date.

"With relatively abundant computing resources and no financing pressure, DeepSeek was able to focus on model development rather than product development in the first few years, which made it appear more pure and focused compared to other domestic large model companies, allowing it to make breakthroughs in engineering technology and algorithms," said the executive of a domestic large model company.

Furthermore, as the large model industry is gradually becoming more closed, with OpenAI being jokingly called "CloseAI", DeepSeek's open-sourcing of models and public technical reports have also won widespread praise from developers, allowing its technical brand to quickly stand out in the domestic and international large model market.

Success proves the power of young people

"DeepSeek's success has also shown everyone the power of young people. Fundamentally, this generation of artificial intelligence development needs young minds more," said an industry insider.

Previously, Jack Clark, former policy director at OpenAI and co-founder of Anthropic, believed that DeepSeek had hired "a batch of inscrutable geniuses". In response, Liang Wenfeng said in an interview with self-media that there were no such inscrutable geniuses, but rather graduates from top domestic universities, doctoral students, and young people who had graduated only a few years ago.

From the currently available media reports, the most notable feature of the DeepSeek team is that they are from top universities and young, with even the team leaders being mostly under 35 years old. In a team of less than 140 people, the engineers and R&D personnel are almost all from top universities such as Tsinghua University, Peking University, Sun Yat-sen University, and Beijing University of Posts and Telecommunications, and have not been working for long.

Here is the English translation of the text, with the specified terms preserved:

A headhunter responsible for recruiting top-tier tech talents in the large model field told The Paper that DeepSeek's hiring logic is not much different from that of other companies in the large model field, and the core label for talents is "young and high-potential", that is, born around 1998, with work experience preferably no more than five years, "smart, science and engineering background, young, and less experienced."

However, the aforementioned headhunter also said that the essence of large model startups is still that of a startup company, and it is not that they do not want to recruit top-tier AI talents from overseas, but the reality is that not many top-tier AI talents from overseas are willing to come back.

An anonymous DeepSeek employee told The Paper that the company's management is very flat, and the atmosphere of free exchange is relatively good. Liang Wenfeng's whereabouts are uncertain, and most of the time, everyone communicates with him online.

The employee previously worked on large model technology R&D at a domestic tech giant, but felt like a cog in the machine and unable to create value, and ultimately chose to join DeepSeek. In his view, DeepSeek is currently more focused on underlying model technology.

The work atmosphere at DeepSeek is completely bottom-up, with natural division of labor, and there is no upper limit on the mobilization of cards and people. "Bring your own ideas, no need to Push. In the exploration process, when he encounters a problem, he will pull people to discuss it," Liang Wenfeng said in a previous interview.

"It is still too early to believe that China's AI has surpassed the US"

The US business media Business Insider analyzed that the newly released R1 indicates that China can be comparable to some of the top artificial intelligence models in the industry and keep pace with the cutting-edge development in Silicon Valley; secondly, open-sourcing such advanced artificial intelligence may also pose a challenge to those companies trying to make huge profits by selling technology.

However, it may still be too early to shout "China's AI has already surpassed the US". Liu Zhiyuan publicly stated that we need to be vigilant against public opinion shifting from extreme pessimism to extreme optimism, thinking that we have already comprehensively surpassed and are far ahead, "not at all". Liu Zhiyuan believes that the new AGI technologies are still accelerating, and the future development path is still unclear, China is still in the stage of catching up, although it is no longer out of reach, but can only be said to be following closely, "it is relatively easier to follow and run fast on the path already explored by others, the greater challenge is how to blaze a new trail in the mist."

"It's too competitive now, everyone is too anxious, and they don't realize that DeepSeek has finally emerged," a person close to DeepSeek sighed to The Paper, the speed of industry change is too fast, and it is impossible to predict what can be done next, they can only look at the changes in the next Q3 quarter.

Demis Hassabis acknowledges China's very strong engineering capabilities and scalability, but he also points out that the West is still leading and needs to consider how to maintain the leading position of the West's frontier models.

Although Liang Wenfeng previously stated externally that DeepSeek only does models and not products, as a commercial company, it is almost impossible to always just do models and not products. On January 15, the official DeepSeek App was officially launched. A person close to DeepSeek told The Paper that commercialization has already been put on DeepSeek's agenda.

In the view of industry insiders, compared to other large model startups in China, DeepSeek is lucky, without financing pressure, no need to prove to investors, and no need to balance the technical iteration of the model and the optimization of product applications. But as a commercial company, after a huge investment, sooner or later they will face the pressure and challenges currently faced by other model companies. "This breakthrough has made a successful marketing for DeepSeek on the eve of commercialization, but in the future, after true commercialization, it needs to withstand the test of the market, and whether it can continue to forge ahead is still uncertain," said the aforementioned model company insider.

What can be certain is that DeepSeek will face more pressure and challenges in the future, and the race to the general model has just begun, and who can win depends on the continuous investment of funds and the iteration of technology. But industry insiders also believe that "for the domestic model industry, the participation of a company like DeepSeek with real technical strength is a good thing."

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments