Why Anthropic CEO is hostile to DeepSeek and Chinese AI

This article is machine translated
Show original
Here is the English translation of the text, with the content inside <> kept as is and not translated:

Source: Silicon Stance

A scientist, a few months ago, wrote an article proclaiming his and his company's good intentions to solve many aspects of human physical and mental health, mental illness, poverty, peace, and the meaning of work and life through powerful artificial intelligence. A few months later, it was the same scientist who suddenly released another article, strongly calling for not allowing any American chips to be exported to China, in order to limit the development of China's artificial intelligence and maintain the "unipolar world" of artificial intelligence (I was shocked that he would use this term so bluntly). Regardless of how one looks at this, it seems disjointed and somewhat hypocritical.

This person is Dario Amodei, the founder and CEO of the artificial intelligence company Anthropic, an Italian-American, a neuroscience Ph.D., a senior AI scientist, a former member of the OpenAI research team, an early employee of Baidu's deep learning laboratory, an idealist who claims to build the most powerful and safe AI, the founder of OpenAI's most important competitor, and now the most ardent advocate for the comprehensive embargo on China's AI.

Although Anthropic and its Claude series models are relatively unknown to the Chinese public, it is currently the provider of the world's most popular large language model for AI application developers, and it also has a considerable following among China's AI researchers and developers. But overnight, many Chinese AI practitioners openly expressed that Anthropic and Amodei himself have lost their most basic respect.

This is the effect of a "manifesto".

In this article titled "On DeepSeek and Export Controls", Dario Amodei casually claimed that the achievements of DeepSeek have exaggerated the advantages of American AI, while expressing affirmation for the innovation of the DeepSeek-V3 model, he resolutely refused to acknowledge the breakthrough achieved by the DeepSeek R1 inference model, which had a greater impact (this is the focus of the discussion in the later part of the article). He was even more unwilling to acknowledge the achievements of the DeepSeek model in terms of computational cost and algorithm efficiency - using an unverified rumor that DeepSeek had 50,000 smuggled Nvidia A100, H100 and H800 GPUs to prove that the DeepSeek-V3 model could not have been trained at a cost of only $6 million. Obviously, Amodei cannot accept that DeepSeek has replaced the path of computational power accumulation with algorithmic efficiency innovation, which is increasingly recognized, and therefore he does not hesitate to use the unverified premise of DeepSeek smuggling a large number of high-end GPUs to realize this argument. But he also stated that the US export control on China's computing power has not failed - he may have conveniently forgotten that his previous argument was based on the hypothesis of DeepSeek smuggling.

Source: https://darioamodei.com/on-deepseek-and-export-controls

Let's reconstruct his logical line of argument: The influence of DeepSeek has been exaggerated - V3 is indeed an innovation, but it could not have cost so little - I heard they smuggled chips - so they must have spent more on training - DeepSeek has no originality, it is based on our research foundation so the cost is naturally lower - the R1 inference model is absolutely not innovative, it just reproduces the results of o1 (pretending not to see that OpenAI has already acknowledged the independent discovery of DeepSeek's achievements in inference) - export control has not failed, it is correct (forgot that his previous argument was based on the premise that DeepSeek obtained smuggled GPUs) - we need to build a unipolar world of AI, China absolutely cannot produce models on par with ours (forgot that he said at the beginning that DeepSeek is not to be feared) - therefore, not only H100 and H800, but even the lowest-end H20 cannot be exported to China, so that China will not win.

You see, a scientist who always talks about logic and reasoning, trying to use a ten-thousand-word article to prove an inconclusive conclusion that he must maintain in form, will appear so clumsy and hypocritical.

This is not the first time Dario Amodei has called for strengthening the control of computing power exports to China, and one cannot expect an American AI scientist to have innate goodwill towards China. But against the background of DeepSeek's widespread attention, affirmation and a certain degree of panic in Silicon Valley, his specific advocacy for further restricting China's computing power exports and his strenuous denial of DeepSeek's innovations in computational efficiency optimization and model inference methods are very worthy of attention and analysis. No one expects his goodwill towards China, but his malice and resentment towards China and the Chinese AI company DeepSeek are so strong that it is worth pondering.

Why does Dario Amodei "look down on" DeepSeek-R1?

Although he has made great efforts to speculate that the training cost of DeepSeek-V3 is more than $6 million, Dario Amodei has indeed acknowledged that V3 is a real innovation. But he insists on emphasizing that this is not a breakthrough, but "an expected point on the continuous cost reduction curve". He believes that "the difference is that the first company to demonstrate the expected cost reduction is a Chinese company, which has never happened before and has geopolitical significance". This kind of praise with an unwillingness to praise sincerely is really tiring to watch. I would rather see Amodei directly say: "American companies have all been innovating on model cost reduction, it just so happens that DeepSeek was the first to do it", but being straightforward is not one of his qualities.

When it comes to DeepSeek-R1, Amodei became straightforward. He absolutely refuses to acknowledge that R1 is a groundbreaking achievement, and on this issue he leaves no room for compromise. He ignores the fact that even OpenAI, which has trained the reinforcement learning models o1 and o3, has acknowledged that R1 has made original breakthroughs in reinforcement learning methods, and also ignores the research results that point out that DeepSeek's reinforcement learning has broken away from human feedback, which is a "AlphaGo moment" for large language models. He insists that R1 is just doing reinforcement learning based on V3, and all its actions are just reproducing o1, and every American AI company is trying this kind of reasoning, it is a technical trend, and has nothing to do with open source, it just so happens that DeepSeek was the first to do it.

We do not need to be indignant because of Amodei's stubbornness, after all, as a recognized accomplished researcher in the AI field, Amodei's views on some key issues can greatly influence the perceptions of the AI industry, the venture capital industry, Wall Street, and even Washington DC towards the DeepSeek phenomenon. This is also the reason why he must come out. He is not defending OpenAI (the grudges between him and OpenAI can go deep), but at this time, he must come out to lay the groundwork for the next move of Anthropic, which he founded.

A very obvious fact is that Anthropic has not yet officially released any inference model. Although Dario Amodei has publicly stated in interviews that he is disdainful of standalone inference models - at the time, of course, he was targeting OpenAI.

Amodei's view is that reasoning is not that difficult, and the base model is more important. Similar to his subtle praise for the innovation of DeepSeek-V3 while acknowledging that it is still weaker than his Claude 3.5 Sonnet model in terms of benchmarking, he has publicly acknowledged the breakthrough of o1, but does not believe that reinforcement learning is the best method to achieve model reasoning capability enhancement. He stated that in some specific scenarios and practices, the reasoning capability of the pre-trained model Claude 3.5 Sonnet is not weaker than o1. Therefore, he does not believe that inference models and general models should be separated, and the pre-trained base model is still more important, which can incorporate reasoning capabilities.

Therefore, it is very likely that Anthropic plans to achieve a leap in model reasoning capability in a way different from OpenAI and DeepSeek. It will most likely be reflected in the next generation flagship base model of Claude, and will still mainly use reinforcement learning based on human feedback (RLHF), supplemented by other reinforcement learning methods (as Amodei himself said) - this is significantly different from the thinking chain (CoT) of OpenAI's o1 and the breakthrough of DeepSeek's R1 in AI autonomous reinforcement learning.

Here is the English translation of the text, with the content inside <> not translated:

Anthropic, which is completely derived from OpenAI and sees OpenAI as its most direct (almost the only) competitor, is in a sense the most orthodox believer in the series of large language model ideas of OpenAI in the pre-GPT-4 era. Amodei has repeatedly come out to deny the phenomenon of "hitting the wall" and diminishing returns to scale as training data dries up, and has repeatedly emphasized the importance of the classic "Scaling Law" (i.e., continuous expansion of model scale is the only way to lead to performance enhancement). AI researchers and developers are eagerly awaiting Anthropic to break through the Scaling Law and the bottleneck of pre-trained models, and to launch a new generation of flagship pre-trained models with stronger reasoning capabilities.

But so far, Anthropic has not yet released this thing. Given its excellent model training and never-futures-release track record, there is reason to believe that Anthropic is working hard to prepare this pre-trained model with stronger reasoning capabilities, in order to prove that OpenAI's o1 is not the best path to achieve improved reasoning capabilities. But with the release of DeepSeek-V3, the things they need to prove have suddenly increased.

First, DeepSeek-V3 further proves after R1 that the path of independent reasoning models based on reinforcement learning is reliable, and may even be the best; secondly, DeepSeek-V3 validates that reinforcement learning can enable AI to engage in deep thinking without human feedback (Dario Amodei is one of the main inventors of reinforcement learning based on human feedback); thirdly, DeepSeek-V3 proves that the training cost to achieve this can be significantly reduced.

This means that once Anthropic launches a new pre-trained model with stronger reasoning capabilities, it will have to answer more complex questions than before: Why not use reinforcement learning as the main training mode? What are the advantages of reinforcement learning based on human feedback over the autonomous reinforcement learning represented by R1? And, what is your training cost? Is there a cheaper and more efficient way? Can the API price be lowered? (The Claude API is the most expensive in the world, while Deep Seek is almost the cheapest)

And these thorny issues and troubles are all brought about by DeepSeek.

Therefore, before launching its own new model with stronger reasoning capabilities, Anthropic's "spiritual figure" Dario Amodei can only actively come out and try to downplay and dispel people's preconceived good impression of DeepSeek-R1: it is absolutely unacceptable to admit that it is an innovation and breakthrough, and it is also difficult to accept that its cost has really been reduced.

Dario Amodei (Source: Wikipedia)

This is a problem of two paths, with a bit of a "you die, I live" flavor. And these two paths are, to some extent, different representations of the two paths of the classic Silicon Valley-style model training and the Chinese-style model training in the "post-pre-training era" of large language models: the former relies on the advantage of computing power resources, using the rough and violent aesthetics of piling up computing power to improve model performance; the latter focuses on algorithmic efficiency, reducing training costs through architectural and engineering innovations, while also improving model performance.

Anthropic is even more of a proponent of computing power scale, model scale and violent aesthetics than OpenAI, which has also led Dario Amodei's newly published article to not only subtly release malice towards DeepSeek, but also openly project this malice towards the entire AI field in China.

Why is Dario Amodei so obsessed with export controls on computing power?

This is not the first time Dario Amodei has publicly called for strengthening export controls on China. He has previously expressed the view in interviews that export controls on computing power to China are necessary and need to be strengthened. American friends should not be sorry about this, and Chinese friends need not be angry either, as this has always been his stance.

But taking advantage of the "DeepSeek effect", Amodei seized the opportunity to write a few thousand words, using the trend that DeepSeek may be on par with the United States in the background of Chinese artificial intelligence to call for further strengthening of computing power controls on China, which is very interesting. Believe me, when an American scientist or entrepreneur publicly expresses an attitude of being too intimate or hostile towards China, their personal demands are the top priority.

Let's first re-examine what Anthropic is.

Without a doubt, it is the most outstanding artificial intelligence company in the United States and the world today - sometimes even unparalleled. Dario Amodei is the technical soul of the company. Compared to his denigration of DeepSeek and his contradictory and coy attitude when talking about export controls on computing power, when he talks about the vision, limitations and explanation of specific artificial intelligence terms and theories, he presents a convincing rationality, restraint, clarity and precision, much more convincing than his former colleague, the indeed not very technical CEO of OpenAI, Sam Altman.

Of course, as OpenAI's main competitor, Anthropic's most striking label in the public eye is "safety", which is also the most criticized area of OpenAI. Of course, it has also done a lot for safety, such as embedding the "Constitutional AI" principle of reinforcement learning based on human feedback (RLHF) throughout the entire model training process. "Safety" is Anthropic's selling point, and sometimes it also becomes its burden.

In 2024, Anthropic snatched 15% of OpenAI's market share in the enterprise market, of course because the Sonnet 3.5 model is indeed powerful, and on the other hand, it is thanks to the "safety" talisman. But when you think about it carefully, who should be the main buyer besides targeting the enterprise users with "safety"?

The answer is obvious: the government. To be precise, the U.S. government.

But in participating in federal government and related department projects, Anthropic, as a latecomer, obviously does not have the same appeal as OpenAI. The first major AI project of the Trump 2.0 era - "Stargate" - was led by the White House, with OpenAI and SoftBank as the main participants, and Anthropic was not involved.

Although Dario Amodei immediately mocked the Trump administration's "Stargate" as "a mess" at the Davos Forum, it is clear that no AI company wants to participate in U.S. government-led projects more than Anthropic. To this end, he has also done a series of self-contradictory things:

On the one hand, on January 6, just before Trump's formal inauguration, Dario Amodei published a bylined article in The Wall Street Journal titled "Trump Can Ensure America's AI Leadership", which was clearly a proactive attempt to cooperate and feel the waters.

On the other hand, the controversial "Frontier AI Model Safety and Security Innovation Act" introduced at the end of the previous Democratic administration, which aimed to strengthen regulation and require AI companies to actively share model research results with the government, was almost unanimously opposed by the Silicon Valley camp from both the progressive and conservative camps, and was eventually rejected by California Governor Newsom. But our Dario Amodei was almost the only AI company founder in Silicon Valley who supported this act.

In the past, I had naively thought that Anthropic had the shadow of early Google, because this company places transparency, interpretability and ethics at the bottom of technology and products, with an idealistic luster. But the early Google had embedded these principles in the core values of the founders and team, and never advocated achieving this through regulation and administrative will. The two founders of Google have never tried to train themselves as the White House's compradors. But our Dario Amodei is not like that.

Unfortunately, the Trump cabinet filled with new supporters in Silicon Valley has very different ideas on the development and regulation of artificial intelligence from the Biden cabinet. At least for now, this group does not seem to buy Dario Amodei's account. After Amodei published that article calling for stronger export controls on China, Marc Andreessen, the founder of the venture capital firm Andreessen Horowitz that supports Trump, came out to slap him in the face: "Closed-source, opaque, nitpicking, seeking political manipulation versus open-source and free is not the way America needs to win."

Here is the English translation: In a sense, Dario Amodei, who is eager to obtain a major contract from the federal government and hopes to participate in a national-level artificial intelligence "mega-project", unconditionally supported AI regulation during the Biden administration, and then praised Trump as the savior who could ensure America's AI leadership after his election, is now actually in a state of ecological isolation. He is not in the core circle of US AI policy-making, but he very much wants to get in, so he has to take on a more radical and resolute posture to get this ticket. At this time, DeepSeek appeared, which made him somewhat passive in the path of reinforcement learning, but also gave him a radical opportunity to curb the development of Chinese artificial intelligence. Coincidentally, Anthropic's model training path relies on the scale expansion of computing power accumulation, making him unwilling to believe that algorithm efficiency and engineering optimization can really reduce the cost of computing power, but rather believes that strangling the neck of computing power can cut off the path of China's AI. And this proposition is precisely the one that the White House is most likely to understand and accept. So it is not difficult to understand why Amodei is so obsessed with calling for more severe export controls on computing power. I can't help but sigh: the core figures of the new generation of artificial intelligence companies in the United States - whether it's Sam Altman of OpenAI, Dario Amodei of Anthropic, or even Mark Zuckerberg of Meta and Alexandr Wang of Scale.ai - they and their careers have accepted the "nationalist" training of the United States so naturally and quickly. While most of the artificial intelligence entrepreneurs in China - the latest representatives being DeepSeek and its founder Liang Wenfeng - have accepted the "training" of internationalism and globalization. This is a really interesting phenomenon.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
1
Add to Favorites
Comments