US AI Firewall to China: The Fig Leaf of Its Failed Containment Policy

How Washington's AI protectionism could backfire as Beijing charges ahead with transparent Large Language Mode development.

May 27, 2024

Linwen

Editor-in-Chief, The China Academy

Click Register

Try Premium Member

for Free with a 7-Day Trial

Click Register

Try Premium Member for Free with a 7-Day Trial

As the Sino-US tech war unfolds, the US is considering measures to regulate the export of open-source Large Language Models (LLMs) as part of a broader effort to safeguard American AI technology from being accessed by countries like China and Russia. The proposed bill, known as the Enhancing National Frameworks for Overseas Restriction of Critical Exports (ENFORCE) Act, aims to grant the White House the authority to restrict the export of AI models deemed a national security threat to China.

How would such a ban on open-source LLMs export affect China’s AI development? Or is it just the all so familiar much-ado-about-nothing drama from US politicians?

To answer this, we need to first take a closer look at Chinese LLMs. For that, we don’t even have to sift through the myriad AI-related content on the Chinese internet.

AI experts in Silicon Valley are well aware (much better informed than those in Capitol, certainly) of what their Chinese colleagues have been working on. In a post on X, Tanishq Mathew Abraham, head of research at Stability AI, observes that many of the most competitive open-source large models, including Qwen, Yi, InternLM, Deepseek, BGE, CogVLM and other models, are actually from China. And he goes further and concludes that the narrative that China is behind on AI is simply not true.

Mr. Abraham is definitely not alone in expressing the enthusiasm surrounding the remarkable performance of Chinese LLMs. In another X post, Liquid AI’s Senior Scientist for Machine Learning Maxime Labonne weighs in and shares his excitement on the high score of Qwen 1.5-110B of Alibaba.

In April of this year, the Chinese tech giant released the first trillion-parameter-level model in Qwen 1.5-110B mentioned above. This model can effectively process context lengths of up to 32K tokens and supports multiple languages, including English, Chinese, French, Spanish, and German. Built on the Transformer architecture, it incorporates an efficient grouped query attention mechanism. The model’s foundational capabilities are on a par with those of top-notch Western LLMs, such as Meta-Llama3-70B and Mixtral-8x22B, exhibiting outstanding performance in chat-based evaluation scenarios such as MT-Bench and AlpacaEval 2.0.

No wonder the Qwen 1.5-110B has once claimed the top seat on Hugging Face (world’s largest open-source LLM community) league table for open-source LLMs. On the high potential of the Qwen model, Bindu Reddy, the founder and CEO of Abacus.AI utters the rallying call for it to join the open-source revolution that would break the monopoly of big capital. She might as well expose the motivation behind the proposed LLM export ban other than geopolitical considerations. The talk of revolution as such would keep the capitalists and their proxies in Washington awake at night.

The rise of Chinese LLMs doesn’t stop at Qwen. DeepSeek V2, nicknamed as LLM Temu for its surprisingly low cost, has surpassed GPT-4 and claimed a top 3 spot in AlighBench ranking. The AlignBench project focuses on benchmarking the alignment of LLMs in Chinese.

DeepSeek V2, while closely rivaling the capabilities of leading closed-source models, has significantly reduced its API pricing to 1 USD for every million tokens input and 2 USD for output (32K context). This pricing is only one-seventh of Llama3 70B and approximately one-hundredth of GPT-4 Turbo, making it an exceptional value proposition. Despite its affordability, DeepSeek still manages to turn a profit, achieving a peak throughput of 50,000 tokens per second on machines equipped with 8 x H800 GPUs. With the output API pricing considered, each node generates an income of 50.4 USD per hour. Taking into account the cost of 8 x H800 nodes in China stays at around 15 USD/hour, DeepSeek’s servers can yield a remarkable profit of 35.4 USD per hour, resulting in a gross profit margin exceeding 70%. These notable features, including efficiency, usability, and groundbreaking pricing, address the urgent needs of the open-source community.

These advancements have gained significant attention from SemiAnalysis, a reputable semiconductor research and consulting company. In an extensive article published on May 7th, SemiAnalysis specifically highlighted DeepSeek V2 as the “Phoenix from the East,” praising its exceptional cost-effectiveness and its dominance over other models from an economic standpoint. The article also suggested that industry challenges for OpenAI and Microsoft may extend beyond the United States.

It’s not just the big tech firms that have rapidly caught up. Yesterday, China’s DeepSeek open-sourced a new model that is both cheaper to run than Meta’s Llama 3 70B and better. While the model is more tuned for Chinese language queries (tokenizer / training data set) and government censorship of certain ideas, it also happens to win in the universal languages of code (HumanEval) and math (GSM 8k).” ——SemiAnalysis

Philipp Schmid, the Chief Technology Officer at Hugging Face, highly recommended the various skill sets of DeepSeek V2 to the community in a post on X. Within just four days of its launch, DeepSeek V2 has already received 3,522 downloads on Hugging Face and garnered an impressive 1,200 stars on GitHub.

While some Chinese models, like DeepSeek, prioritize the supremacy of computational power, focusing on economic efficiency. Others, like Qwen, adopt a comprehensive approach, expanding their repertoire of model sizes. However, the majority of companies adhere to the Scaling Law, fervently pursuing large parameter sizes.

In contrast, Modelbest Inc., a Chinese startup, takes a divergent route by striving to minimize parameters. Their aim is to maximize model efficiency with lower deployment thresholds and reduced usage costs, accomplishing more with less.

On February 1st of this year, Modelbest unveiled the MiniCPM-2B model, consisting of a mere 2.4 billion parameters. Not only does it outperform models in its class, such as Google Gemma 2B, but it also surpasses benchmark models like Mistral-7B in performance, and even outshines larger-parameter models like Llama2-13B and Llama2-70B-Chat in certain aspects.

Only 70 days later, the company is on the move again and releases the MiniCPM-Llama3-V 2.5 multimodal model, which represents a significant leap in performance, showcasing unprecedented advancements:

Elevated multimodal capabilities: Despite its modest parameter size of just 8 billion, it surpasses the performance of Google’s colossal multimodal model, Gemini Pro, and OpenAI’s GPT-4V.

Cutting-edge OCR prowess: The model demonstrates exceptional precision in recognizing long images, challenging visuals, and extensive texts, delivering pixel-perfect clarity that is nine times superior. Moreover, it possesses both recognition and inference capabilities.

Mobile breakthrough: In a groundbreaking achievement, MiniCPM-Llama3-V 2.5 integrates NPU (Neural Processing Unit) and CPU acceleration frameworks, enabling comprehensive system-level acceleration for multimodal large models on mobile devices. This enhancement results in an astounding 150-fold increase in speed.

Multilingual proficiency: With support for over 30 languages, including not only Chinese and English but also popular languages like French, German, Spanish, and more, the model covers a wide linguistic spectrum, encompassing countries along the Belt and Road initiative. Notably, on the esteemed evaluation platform OpenCompass, Modelbest’s MiniCPM-Llama3-V2.5 outperforms the multimodal powerhouses GPT-4V and Gemini Pro in overall performance, solidifying its position as the most formidable on-device model available today.

Following its open-source release in the international community, Thomas Wolf, the co-founder of Hugging Face, promptly acknowledged this development in a post, remarking, “China has presented a series of astounding technical reports and open-source models, such as DeepSeek, MiniCPM, UltraFeedback… Their data and experimental results are openly shared, a level of transparent knowledge-sharing that has been missing in recent Western tech model releases.”

The smaller parameter size of models like MiniCPM brings forth numerous advantages. Firstly, it allows China to overcome the challenge of a relative shortage of advanced AI chips. Secondly, these compact models can be deployed in devices and robots operating in remote areas with limited or no internet access. Presently, most devices powered by large language models (LLMs) require a constant connection to access LLMs deployed on the cloud through APIs. However, with models like MiniCPM, robots operating in jungles or underwater environments can rely on their built-in LLMs, independent of external connectivity. This breakthrough opens up new possibilities for deploying AI capabilities in challenging and resource-constrained settings.

Furthermore, the case of MiniCPM-Llama3-V2.5 provides a valuable lesson on the significance of international cooperation. As implied by its name, the model is built upon the open-source Llama3 from Meta, serving as its foundational model. While the open-source movement strives to narrow the technology gap between nations and social classes, there are those who seek to widen it. Vinod Khosla, one of the early investors in OpenAI, has raised concerns about sharing LLMs with China. However, no one has responded to this concern better than the AI luminary Yann LeCun: AI is not a weapon.