Four Facts That Debunk the U.S. Smears Against DeepSeek

DeepSeek has achieved remarkable results with minimal investment. In just over six months, it has not only iterated from version V2 to V3 of its general-purpose LLM model, but also launched the R1 model, which focuses on inference capabilities. With groundbreaking advancements in training costs, architectural adjustments, and its open-source model, DeepSeek has generated a wave of praise globally. During the Spring Festival, the dramatic fluctuations in capital markets and the surge in the domestic stock prices of “DeepSeek concept stocks” have kept this phenomenon at the center of public discussion.
The success of DeepSeek aligns with the inevitable evolution from pre-training to inference in AI large models. In late February of last year, NVIDIA’s CEO Jensen Huang, during an interview with the U.S. tech media Wired, said, today, NVIDIA’s business is about 40% inference and 60% training. This is a good thing because it shows AI has finally succeeded. If Nvidia’s business is 90 percent training and 10 percent inference, you could argue that AI is still in research. In December last year, OpenAI’s CFO, Sarah Friar, told The Information, OpenAI’s ChatGPT Pro subscription, priced at $200 per month for end-users, is far too cheap. Its reasonable price should be $2,000 per month. Based on the context of her interview, she was suggesting that OpenAI had kept the price low due to a sense of fairness, driven by a moral obligation to democratize AI for the masses. Now, this notion is thoroughly debunked by the DeepSeek R1 open-source model.
These two statements are highly representative—one points to the evolution of AI technology applications, while the other addresses the commercialization of AI pre-training models. These two issues are intertwined and complementary.
At the same time that OpenAI led the creation of the “Stargate,” extending the Scale Law of computing power to the realms of private capital markets and national investments, DeepSeek fundamentally deconstructed its narrative.
Amid the clamor, the skepticism and even malicious defamation deserve attention. Analyzing the panicked comments from some key figures in the U.S. AI large model industry can deepen our understanding of where DeepSeek truly struck a nerve. The detailed analysis and doubts coming from the other side of the ocean, particularly from Dylan Patel, president of the well-known semiconductor consultancy Semianalysis, and Dario Amodei, CEO of Anthropic, have been widely disseminated on Chinese internet platforms.
Dario Amodei, CEO of Anthropic
They primarily attempt to convince the public that DeepSeek’s breakthroughs are not as “hardcore” as they seem, by focusing on four aspects: GPU hoarding, cost estimates, non-technical marketing, and non-compliant model data distillation.
DeepSeek Hoarding “Sensitive” High-end GPUs?
According to Semianalysis’s estimates, DeepSeek likely owns about 10,000 H800 GPUs, 10,000 H100 GPUs, and a large number of H20 GPUs.
In a lengthy article, Dario Amodei reiterated Semianalysis’s estimates, suggesting that DeepSeek holds about 50,000 NVIDIA Hopper architecture GPUs (including both cut-down and full versions) for training and inference, a quantity two to three times higher than that of major U.S. AI model training institutions such as OpenAI and DeepMind. Coupled with post-training methods that use synthetic data generation and reinforcement learning to enhance inference capabilities, he believes DeepSeek stands on the shoulders of giants and has used a massive amount of GPUs to achieve its results.
Why did Dario Amodei use Semianalysis’s data to confirm his narrative? Because Amodei holds a belief in the so-called “Moore’s Law” for AI training costs—costs tend to decrease by three to four times each year, and with reinforcement learning for inference architecture adjustments, costs can drop six to eight times. But this, according to him, is the limit. Based on
these cost estimates, Semianalysis speculates that DeepSeek possesses 50,000 Hopper GPUs.
So, how did Semianalysis come to the conclusion that DeepSeek holds such a vast number of high-end GPUs? They used a reductio ad absurdum argument: Since Anthropic spent tens of millions of dollars training a single Claude 3.5 Sonnet model, if DeepSeek really had such cost-cutting capabilities, why would Anthropic go through the trouble of securing hundreds of millions in financing from Amazon?
This criticism of how Anthropic spends investor funds might be more interesting for Elon Musk’s DOGE to answer. Unlike Microsoft and Google, Anthropic, which represents Amazon’s cloud services, became particularly vocal in an attempt to address their own AI model costs, realizing that their Claude series was the most expensive with the lowest cost-effectiveness in the AI training market.
DeepSeek legally owns the H800, which, compared to the H100, mainly has reduced NVLink communication bandwidth. Although the H20 is also a downgraded version with a single-card computing power only 20% of the H100, it can achieve even higher HBM capacity (96GB) than the A100/H100 (80GB) through a multi-card stacking mode. In other words, the memory bandwidth of the H20 allows DeepSeek’s Decode phase to generate one token in less time than the A100 or H100.
DeepSeek has managed to extract value from this downgraded version in ways that the embargoed versions could not, prompting Dario Amodei to make malicious statements calling for tighter GPU regulations in China. This may be the real reason behind his criticisms of DeepSeek.
From a discourse perspective, Semianalysis uses the lack of fairness in AI model training costs by Anthropic to infer that DeepSeek might be circumventing regulations and illegally holding high-end GPUs. In turn, Anthropic uses Semianalysis’s baseless
conclusions to argue that DeepSeek does not have a significant edge in terms of cost. This is essentially a collusive circular argument.
Has DeepSeek Hidden Total Cost (TCO) Parameters?
Semianalysis and Anthropic’s cost analysis of DeepSeek also includes factors beyond GPU procurement, such as optimizing architecture, processing data, and paying employee salaries. However, these are aspects we need not spend too much time refuting.
Typically, the cost of renting an H100 on the cloud does not include electricity costs, and the actual cost of IT equipment in a data center is closely tied to land area, environment, and policy support.
Semianalysis, which has never conducted on-the-ground research in China, relies on U.S. market conditions to estimate DeepSeek’s API service costs, which they deem inappropriate.
In the U.S., cloud services and large model deployment partnerships are quite complex. Many customers choose to use Microsoft for both public and private inference instances, and Microsoft cleverly swapped their cloud service credits for an “angel round” investment in OpenAI. Amazon, meanwhile, promotes its SageMaker platform for creating, training, and deploying models, but uses NVIDIA’s native Nemo framework instead to develop their models.
Unlike Semianalysis’s analysis of DeepSeek’s R1 model optimized with Multi-head Latent Attention (MLA) for its KV Cache mechanism, their analysis of DeepSeek’s hosting, operations, and employee salaries seems more speculative.
Did DeepSeek Win Through Marketing?
More bewildering than the cost estimates lacking solid research and logical backing is the argument by
Semianalysis and Dario Amodei about DeepSeek’s marketing strategies. These include, but are not limited to, showcasing the inference thought process with the R1 model and timing the release to coincide with Trump’s inauguration.
In a recent video, Semianalysis president Dylan Patel even pointed out that DeepSeek’s marketing success lies in speed, such as rushing to release the less mature V2 model over six months ago for hype. However, major overseas companies have already pushed back against this “marketing” claim. From January 25 to February 1, AMD’s MI300X GPU, NVIDIA’s NIM microservices, and Intel’s Gaudi 2D AI accelerator all expressed support and integration with DeepSeek’s V3/R1/Janus models. If DeepSeek hadn’t demonstrated sufficient technical strength, why would these major firms collaborate with DeepSeek’s “marketing”?
Semianalysis may have overlooked a fact: at the end of 2022, OpenAI’s rush to release ChatGPT followed a strategy of marketing first, debugging later. Google’s Bard (now renamed Gemini) was late to the game, beaten to the punch by OpenAI. This delay was due to the founding team’s concern that such a chatbot might disrupt the search engine market, which is a major source of Google’s revenue from ads driven by search traffic.
This time, under pressure, OpenAI launched the brand-new free o3-mini (interestingly, o3 also mimics R1’s demonstration of the reasoning chain). This shows that the “innovator’s dilemma” curse has little to do with marketing—it’s a wave-like competition based on continuous innovation. Accusing DeepSeek of winning through speed is completely unfounded.
On another level, why don’t OpenAI and Anthropic’s reasoning models show specific reasoning pathways? Is showing the reasoning chain really just a marketing move?
OpenAI and Anthropic justify their decision by
claiming it’s to optimize user interface and avoid information overload. But this issue actually touches on deeper concerns within these companies. On one hand, the internal workings of their models (such as fine-tuning strategies and optimization methods for specific tasks) could allow competitors to reverse-engineer them. Additionally, keeping the reasoning process in a “black box” helps avoid external scrutiny and the embellishment of these tools’ troubled histories. From the very beginning, ChatGPT has been controversial for continuously scraping public media and data resources like The New York Times and The Wall Street Journal for training its corpus, with ongoing doubts about its compliance, even reaching the point of legal action.
Therefore, it’s clear that AI model companies like OpenAI, Google, and Anthropic, which originally grew through marketing, cannot replicate DeepSeek’s so-called “marketing strategy”—it’s not that they don’t want to, but that they simply cannot.
Model Distillation: DeepSeek’s Gift to Mankind
Semianalysis President Dylan Patel and Anthropic CEO Dario Amodei share a common viewpoint on DeepSeek: they both believe that R1 is far less interesting than V3, primarily because R1 likely used model distillation.
While ensuring model performance and efficiency, pushing AI technology towards democratization—making it as ubiquitous as water and electricity—through model data distillation and user knowledge distillation is an inevitable path. It not only optimizes resource utilization but also accelerates the migration of models towards local deployment and edge-side inference, which is crucial for building a sustainable and efficient AI ecosystem.
The founding of OpenAI was essentially a counterreaction to Google’s AI commercialization strategy. Sam Altman and Elon Musk, at the time,
adhered to a vision of finding a path for AGI for the benefit of all humanity, which is why they named the organization “OpenAI.” However, OpenAI has now transformed into “CloseAI,” which has deviated from its original intent.
Dario Amodei criticized DeepSeek for engaging in distillation, claiming it could pose a risk to intellectual property. But as mentioned earlier, these major U.S. tech companies have benefited from the data era’s dividends. Before The New York Times could realize it and pursue legal action, they had already stolen the data for training purposes. Once they’ve swallowed it, how could they possibly spit it back out?
Once upon a time, AI technology, which was esoteric and opaque, was monopolized by academic circles. NVIDIA’s CUDA software development platform initially gave pioneers the opportunity to try their hand in the commercial market. Soon, the focus of AI shifted from prestigious universities like Stanford, the University of Toronto, and Caltech, to startups.
Geoffrey Hinton and Fei-Fei Li joined Google, Andrew Ng went to Baidu, and Altman, along with his former collaborator Ilya Sutskever, co-founded OpenAI, bringing AI into the public eye.
The movement of AI production factors—talent, software and hardware technology, and capital markets—is essentially a form of distillation. Anthropic, which originated from OpenAI, is one of the biggest beneficiaries of user knowledge distillation.
Not long ago, Fei-Fei Li’s team attempted to replicate DeepSeek-R1 for just $50, which precisely embodies the vision of people like Liang Wenfeng—promoting knowledge and information equity. AI should become a public good that benefits all of humanity.
Editor: Li Jingyi