DeepSeek Core Member: Once Turned Down NVIDIA to Return to China

【This article was originally published on China’s largest political website, Guancha.cn, and translated by AI.】
Chinese AI startup DeepSeek’s “meteoric rise” has not only disrupted the technical playbook in Silicon Valley but has also shaken the capital markets on Wall Street and led to American introspection about AI talent migration.
Yu Zhidong, a senior research scientist at NVIDIA, recently revealed on social media that a key engineer from DeepSeek had interned at the American AI chip giant and was on the verge of being hired full-time, but ultimately chose to return to China and join what was then the relatively unknown DeepSeek. This prompted a response from renowned U.S. international political scholar and former Assistant Secretary of Defense, Graham Allison, who stated that the failure to attract and retain talent should serve as a wake-up call for Washington.
An “Impressive” Decision
The DeepSeek researcher under Allison’s scrutiny is named Pan Zizheng. According to his publicly available resume on GitHub, Pan graduated with a bachelor’s degree from Harbin Institute of Technology and a master’s degree from the University of Adelaide in Australia. Between 2021 and 2024, he pursued a Ph.D. in Computer Science at Monash University and interned at NVIDIA during the summer of 2023.
Pan Zizheng (right) and his credit on the DeepSeek-VL2 multimodal model paper GitHub
On January 27, DeepSeek topped the free app download chart on the U.S. App Store, and Pan Zizheng celebrated with a post on the social platform X (formerly Twitter). Yu Zhidong, who was Pan’s mentor during his NVIDIA internship, retweeted while sharing the story of Pan joining DeepSeek.
Yu Zhidong recalled that NVIDIA was considering offering Pan Zizheng a full-time position, but he “unswervingly” chose to return to China and join DeepSeek, which then had only three people in its multimodal team. Yu noted that Pan has played a “key role” in many significant DeepSeek projects, including DeepSeek-VL2, DeepSeek-V3, and DeepSeek-R1.
“I remain impressed by Zizheng’s decision at that time… I am personally delighted for his decision and the tremendous achievements he has made,” Yu Zhidong wrote. “Zizheng’s case is a very typical example that I’ve observed in recent years. Many of our best talents hail from China, and these talents do not necessarily have to succeed only in American companies. On the contrary, we have learned a great deal from them.”
Pan Zizheng posts about celebrating DeepSeek’s “phenomenal moment” as it surpassed ChatGPT in downloads, retweeted by Yu Zhidong X Screenshot
“Not the First, Nor the Last”
In less than two years, DeepSeek rose to fame at home and abroad with its open-source model V3 and inference model R1.
The official training cost of V3 was only $5.576 million, about 1/20th of the budget for OpenAI’s GPT-4; R1’s performance is on par with OpenAI’s o1 but reduced the cost per million tokens (the basic processing unit of natural language processing and machine learning models) from $60 in o1 to $2.19, nearly a 30-fold price difference.
This prompted a question from Graham Allison, a renowned U.S. international political scholar and founding dean of Harvard Kennedy School of Government: “Who missed out on DeepSeek?”
Allison, New Statesman website
“R1 has demonstrated that groundbreaking AI advancements do not necessarily rely on larger computing clusters and massive datasets,” commented the MIT Technology Review. “These discoveries are challenging the traditional belief of ‘bigger is better’, offering new possibilities to institutions and companies with limited computational resources.” Following the release of DeepSeek’s latest model, NVIDIA’s market value plummeted nearly $600 billion in a single day, and the total market value of U.S. listed technology companies shrank by about $1 trillion in one day.
On February 1, Allison wrote that DeepSeek’s disruption to U.S. AI companies is comparable to David slaying Goliath in the Bible. “It also vividly reminds us that the U.S. must take attracting and retaining talent seriously, especially talent from China.”
“Why did Pan Zizheng—the engineer who played a leading role in developing DeepSeek’s R1 model—choose to invest his talents in China rather than the U.S. to create this extraordinary technological breakthrough?” Allison questioned. “The answer: because the Silicon Valley company for which he developed algorithms did not offer him the opportunity to continue this work in the U.S.”
In this context, he mentioned Qian Xuesen, the Chinese “Two Bombs, One Satellite” pioneer who also returned from the U.S.—in the 1950s, Qian was affected by the McCarthyism wave in the U.S., accused of sympathizing with communism, and “expelled back” to China—and asserted that Pan Zizheng “is not the first top talent the U.S. has lost, nor will he be the last”.
Allison believes that Pan Zizheng “is not the first top talent the U.S. has lost, nor will he be the last”.
Allison quoted a column from the Wall Street Journal emphasizing that China has 9 times more engineers than the U.S., and graduates 15 times as many STEM (Science, Technology, Engineering, Mathematics) students. “In today’s world, super geniuses like Qian Xuesen, (NVIDIA CEO) Jensen Huang, or (Tesla CEO) Elon Musk can vote with their feet and apply their talents wherever they choose.”
Returning to China becomes a trend
In fact, according to recent discussions by U.S.-based tech media Rest of World (RoW) with several professionals in China’s tech industry, leaving Silicon Valley to pursue careers domestically has become a trend among China’s top AI talents.
According to a Chinese AI researcher working at a well-known tech company in the U.S. told RoW, American companies often hire Chinese interns with strong engineering or data processing skills to participate in AI projects remotely or from their Silicon Valley office, and the work of these Chinese students is often “very solid”.
However, the researcher noted that even when these Chinese students are offered full-time positions, many still choose to return to China. “What surprises me is that many Chinese students are not that interested in pursuing full-time work in the U.S.”
Regarding the reasons behind this, industry insiders pointed out to RoW that for outstanding graduates, securing a job in China not only means lower living costs and being closer to family, but also provides better personal development opportunities and the chance to handle key tasks at the start of their careers.
Additionally, the report analyzed that part of the reason also lies in the recent U.S. immigration policies that are not friendly to Chinese, and as the domestic AI industry in China flourishes, the employment options for graduates in this field have increased, ranging from tech giants like Alibaba to startups like Juexing Technology, Minimax, and Zero One Infinity.
China’s AI talent pool is abundant
According to a study published in March last year by Macro Polo, a think tank under the U.S. Paulson Institute, in terms of undergraduate institutions, Chinese universities have trained nearly half of the world’s top AI researchers, compared to only about 18% from U.S. universities.
The study also indicates that despite the U.S. making pioneering advances in generative AI, much of this work was completed by researchers educated in China. Among the top AI researchers in the U.S., 38% are from China, while 37% are Americans.
Zhang Huyue, a law professor at the University of Southern California who studies China’s tech regulations, remarked that DeepSeek’s success highlights “the strength of China’s AI talent pool.” “A large number of capable and skilled software engineers have supported DeepSeek,” Zhang said, “I believe this talent advantage lays a solid foundation for the next stage of AI development in China.”
RoW stated that China has cultivated a vast number of homegrown AI researchers through domestic universities, laboratories, and the research institutions of U.S. tech giants in China (such as Microsoft Research Asia headquartered in Beijing), with DeepSeek selecting the best among them.
For example, Song Junxiao, a core contributor to the DeepSeek-R1 model, distinguished himself among his peers during his student years. His Ph.D. advisor at the Hong Kong University of Science and Technology, Daniel Palomar, said that Song studied diligently, “Somehow, (DeepSeek) managed to find the cream of the crop.”
Editor: Zhongxiaowen