DeepSeek: Why does DeepSeek prefer young people with no work experience?

This article is machine translated
Show original
Here is the English translation of the text, with the specified terms preserved:

Author: Sam Gao, Author of ElizaOS

0. Foreword

In recent times, the successive appearances of DeepSeek V3 and R1 have caused American AI researchers, entrepreneurs, and investors to start experiencing FOMO. This grand feast can be as surprising as the emergence of ChatGPT at the end of 2022.

Leveraging the complete open-source nature of DeepSeek R1 (the model can be downloaded for free from HuggingFace for local inference) and its extremely low price (1/100 of the price of OpenAI o1), DeepSeek reached the top of the US Apple AppStore within just 5 days.

So, where does this mysterious new AI force, incubated by a Chinese quantitative company, come from?

1. The Origin of DeepSeek

I first heard about DeepSeek in 2021, when Luo Fuli, a genius girl who worked at Damo Academy and published 8 ACL (top conference in natural language processing) papers in a year, left her job and joined High-Flyer Quant. At the time, everyone was very curious why the highly profitable quantitative company would recruit AI talent: did High-Flyer also need to publish papers?

As far as I know, the AI researchers recruited by High-Flyer were mostly working independently, exploring some cutting-edge directions, with the core focus being on large language models (LLMs) and text-to-image models (the then-existing OpenAI Dall-e).

Time flew to the end of 2022, and High-Flyer gradually began to absorb more and more top AI talents (mostly current students from Tsinghua and Peking University). Inspired by ChatGPT, High-Flyer CEO Liang Wenfeng decided to venture into the field of general artificial intelligence: "We have built a new company, starting from large language models, and later we will also have vision and other areas."

Yes, this company is DeepSeek. In early 2023, companies like Zhipus, Moonlight, and Baichuanzhihui gradually took center stage, and in the bustling Zhongguancun and Wudaokou area, DeepSeek's presence was largely overshadowed by these hot-money companies.

Therefore, in 2023, as a pure research institution without a celebrity founder, DeepSeek (unlike Li Kaifu's Zhiyi, Yang Zhilin's Moonlight, or Wang Xiaochuan's Baichuanzhihui) found it difficult to raise funds independently from the market. Thus, High-Flyer decided to spin off DeepSeek and fully fund its development. In this era of 2023, no venture capital firm was willing to provide funding for DeepSeek, as the team was mostly composed of freshly graduated PhDs without well-known top researchers, and the prospect of capital exit was distant.

In this noisy and frenetic environment, DeepSeek began to write its stories in AI exploration:

  • November 2023, DeepSeek released DeepSeek LLM, with as many as 67 billion parameters, with performance close to GPT-4.

  • May 2024, DeepSeek-V2 was officially launched.

  • December 2024, DeepSeek-V3 was released, with benchmark tests showing it outperforming Llama 3.1 and Qwen 2.5, and on par with GPT-4o and Claude 3.5 Sonnet, igniting industry attention.

  • January 2025, the first generation of large models with reasoning capabilities, DeepSeek-R1, was released, with performance exceeding OpenAI o1 by 100 times at a price less than 1/100, causing the entire tech world to tremble: the world truly realized that the power of China has arrived... Open source will always win!

2. Talent Strategy

I knew some of the DeepSeek researchers early on, mainly focusing on AIGC, such as the author of Janus released in November 2024 and the author of DreamCraft3D, including one who helped me optimize my latest paper, @xingchaoliu.

Based on my findings, the researchers I know are mostly very young, mostly current doctoral students or within 3 years of graduation.

These individuals are mostly graduate or doctoral students in the Beijing area, with strong academic achievements: many are researchers who have published 3-5 top conference papers.

I asked my DeepSeek friends why Liang Wenfeng only recruits young people?

They relayed the following words from High-Flyer CEO Liang Wenfeng:

The mysterious veil of the DeepSeek team has aroused people's curiosity: what is its secret weapon? The foreign media say that this secret weapon is "young geniuses" who are capable of competing with financially powerful American giants.

In the AI industry, hiring experienced veterans is the norm, and many domestic AI startups in China tend to recruit senior researchers or talents with overseas doctoral degrees. However, DeepSeek goes against the grain and prefers young people without work experience.

A headhunter who has collaborated with DeepSeek revealed that DeepSeek does not recruit experienced technical personnel, "work experience of 3-5 years is already the most, and those with more than 8 years of work experience are basically passed over." Liang Wenfeng also stated in an interview with 36Kr in May 2023 that most of DeepSeek's developers are either fresh graduates or people just starting their AI careers. He emphasized: "Most of our core technical positions are held by fresh graduates or people with one or two years of work experience."

Without work experience, how does DeepSeek select people? The answer is, by looking at potential.

Liang Wenfeng once said that for a long-term endeavor, experience is not that important, and basic capabilities, creativity, and passion are more important. He believes that perhaps the top 50 AI talents in the world are not yet in China, "but we can cultivate such people ourselves."

This strategy reminds me of OpenAI's early strategy. When OpenAI was founded in late 2015, Sam Altman's core idea was to find young and ambitious researchers. Except for President Greg Brockman and Chief Scientist Ilya Sutskever, the remaining four core founding technical team members (Andrew Karpathy, Durk Kingma, John Schulman, Wojciech Zaremba) were all freshly graduated doctoral students, from Stanford University, the University of Amsterdam in the Netherlands, UC Berkeley, and New York University, respectively.

From left to right: Ilya Sutskever (former Chief Scientist), Greg Brockman (former President), Andrej Karpathy (former Technical Lead), Durk Kingma (former Researcher), John Schulman (former Reinforcement Learning Team Lead), and Wojciech Zaremba (current Technical Lead)

This "young wolf strategy" has already allowed OpenAI to taste the sweetness, incubating talents such as Alec Radford, the father of GPT (equivalent to a private third-tier university graduate), Aditya Ramesh, the father of the text-to-image model DALL-E (NYU undergraduate), and Prafulla Dhariwal, the three-time gold medalist of the International Mathematical Olympiad, who is responsible for the multimodal aspect of GPT-4o. This has allowed the initially unclear "save the world" plan of OpenAI to forge a path forward through the headlong rush of young people, transforming it from a nameless underdog next to DeepMind into a giant.

Liang Wenfeng has precisely seen the success of Sam Altman's strategy and firmly chosen this path. However, unlike OpenAI, which had to wait 7 years to see ChatGPT, Liang Wenfeng's investment has yielded results in just over 2 years, a true example of the "China speed".

3. Speaking Up for DeepSeek

In the article on DeepSeek R1, its various indicators are surprisingly excellent. But it has also raised doubts among everyone: there are two points of suspicion,

  • ① The Mixture of Experts (MoE) technology it uses requires high training requirements and high data requirements, which indicates that there are valid reasons for questioning Deepseek's use of OpenAI data for training.

  • ② Deepseek uses Reinforcement Learning (RL) technology, which has high hardware requirements, but compared to Meta and OpenAI's Volta cluster, Deepseek's training only used 2048 H800 GPUs.

Due to the limitations of computing power and the complexity of MoE, the fact that DeepSeek R1 succeeded with only $5 million seems a bit suspicious. However, whether you worship its "low-cost miracle" or question its "empty talk", you cannot ignore the dazzling functional innovation.

BitMEX co-founder Arthur Hayes wrote: Will the rise of DeepSeek cause global investors to question American exceptionalism? Is the value of American assets severely overestimated?

Stanford University professor Andrew Ng publicly stated at this year's Davos Forum: "I am impressed by the progress of DeepSeek. I think they are able to train models in a very economical way. Their latest released inference model is excellent... 'Keep it up'!"

Marc Andreessen, the founder of a16z, stated, "DeepSeek R1 is one of the most astonishing and impressive breakthroughs I've ever seen - and as open source, it's a profound gift to the world."

DeepSeek, which stood in the corner of the stage in 2023, finally stood at the top of the world of AI in 2025, just before the Lunar New Year.

4.Argo and DeepSeek

As a technical developer of Argo and an AIGC researcher, I have DeepSeeked some of Argo's important functions: As a workflow system, the rough original workflow generation work in Argo is done using DeepSeek R1. In addition, Argo has built-in the LLM as the standard DeepSeek R1, and has chosen to abandon the closed-source and expensive OpenAI models, because the Workflow system usually contains a large number of Token consumption and context information (average >= 10k tokens), which would result in the execution cost of the Workflow being very expensive if using the high-priced OpenAI or Claude 3.5, and this pre-spending before web3 users truly capture value is a kind of harm to the product.

As DeepSeek gets better and better, Argo will cooperate more closely with the Chinese forces represented by DeepSeek: including but not limited to the Sinicization of Text2Image/Video interfaces and the Sinicization of LLM.

In terms of cooperation, Argo will invite DeepSeek researchers to share their technical achievements in the future, and provide grants for top AI researchers, to help web3 investors and users understand the progress of AI.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments