China’s AI Revolution: Leapfrogging the West With Pragmatic Models
KEY TAKEAWAYS
There is a stereotype in the West, especially in the US, that Chinese are usually good at math. To capitalize on this stereotype, the 2020 presidential candidate Andrew Yang even went so far as to use MATH as his campaign slogan.
Although the perceived Chinese advantage in math might be a myth, these days, a new breed of Chinese AIs are indeed helping American students with their figurin’ homework. When they encounter difficult integral problems, they just take photos and open apps like Answer AI, Question AI, and Gauth on their phones. Within seconds, they will see the step-by-step problem-solving process. All of these products come from China without exception.
According to data from Data.ai dated May 21st this year, five out of the top 20 education apps in the US App Store are designed to help students complete schoolwork – including Answer AI. The founding team behind this app previously worked at Xiaomi and ByteDance, the parent company of TikTok. The other two top homework-helping apps in the download rankings, Question AI and Gauth, were developed by Zuoyebang and ByteDance, respectively.
Market research firm SensorTower’s data shows that since launching in 2019, Question AI has been downloaded over 600,000 times across the US Apple App Store and Google Play Store. Gauth, meanwhile, has achieved install volumes roughly double that figure since its own debut.
The success of these apps reveals a very different approach pursued by Chinese AI companies than their high-profile US counterparts, such as OpenAI. They are more specialized, low-profile and almost invisible when you talk about the megatrends of AI and AGI. But they could be everywhere.
Compared to American AI firms, Chinese firms have had to deal with a shortage of venture capital funding and computing power since day one. Without the like of Microsoft, which can shower OpenAI with cash from its deep pocket, and living under the chip embargo imposed by the US government, these Chinese players have to chart their own path to survival. They need to turn a profit from AI products as quickly as possible before their cash flow dries up.
As a result, they develop diverse strategies to adapt to this grim reality.
MOE model + app/agent ecosystem
“Many people are curious about the release timing of GPT-5, but I’m more interested in identifying applications that can truly unlock the full potential of large language models,” Robin Li, founder and president of Baidu, said in an interview at VivaTech in Paris. He noted that in China, both startups and internet giants are focused on achieving strong “product-market fit” by exploring forms of AI that can be used beneficially by billions of people worldwide.
Li believes the Chinese approach is more application-driven. “Technological advances are shaped by usable scenarios,” he said. While the US and Europe emphasize cutting-edge foundation models, China also examines hundreds of models but increasingly discusses what “killer applications” could define the era. The focus, Li implied, is on harnessing AI through meaningful products people want to use every day, not just advancing the technology itself.
At the recent Baidu Create2024 AI Developer Conference, Li’s Baidu unveiled the AgentBuilder platform with the goal of radically simplifying the process of developing intelligent systems. Through this platform, users can easily and quickly build their own AI agents without requiring advanced programming skills. Individual accounts have the capacity to create up to 50 agents, and each agent can accommodate 10 unique datasets.
In a powerful on-stage demonstration, Li showed how a fully functional intelligent agent could be developed from scratch in just five days using a no-code approach. This opens the door for those who lack extensive programming backgrounds to also engage in the ongoing wave of AI innovation, Li implied. By lowering the barrier to entry, Agent Builder promises to give many more individuals the opportunity to shape emerging technologies through hands-on experimentation and design.
This ecosystem of AI agents is powered by Baidu’s language model, ERNIE Bot. Baidu has been rebuilding its search engine using this model, and now, an increasing number of search results are composites formed by ERNIE Bot in different formats, such as text, images, and third-party links.
However, for more complex tasks, instead of relying solely on one language model, Baidu combines multiple AI systems for different tasks. Lighter models handle general queries, while specialized models focus on niche topics. An expert system then allocates work between these and bigger models, balancing performance needs with computational constraints. The key advantage of this MOE (Mixture of Experts) approach is to save computing power and electricity while boosting the overall performance, allowing it to solve more complex tasks than a single model by distributing the workload intelligently.
The results have benefited users across Baidu’s offerings. According to internal metrics, 11% of core search content now comes from AI. Not only has this improved relevancy, it has also driven advertising growth.
In other areas: the library service saw an 18% increase in paid users after adding AI-powered features for summarizing, creating and customizing knowledge. Maps launched an AI Guide to provide tailored route assistance.
And the cloud platform debuted an intelligent ERNIE-based assistant called “YunYiDuo” that lets people find files, translate documents, and get insights with simple voice commands. Its popularity has soared, amassing 200 million users in just over a year since its launch.
Baidu is far from alone in China in leveraging the MOE architectural approach to achieve more with less. Elon Musk once complained that chip shortage would delay his xAI in training their model Gork2. He estimated that the training of Grok 3 and beyond may depend on 100,000 top-tier Nvidia H100 chips. With each costing $30,000, the chip expenditure alone could reach a staggering $2.8 billion.
Chinese AI companies do not have the same luxury to even dream about procuring anything close to such vast allotments of top-tier chips. Instead, their MOE models work wonders with much less. Inspur Electronic, a lesser-known Chinese player in AI, recently unveiled their open-source MOE model Yuan2.0-M32. The model leverages a hybrid architecture of 32 experts totalling 40 billion parameters yet requiring only 3.7 billion activated parameters each time through strategic task allocation across the network, allowing it to achieve performance on par with the state-of-the-art 70 billion parameters Llama 3-70B with a mere fraction of 1/19th the computational overhead and parameters.
Moreover, based on this MOE model, Inspur Electronic launched its Enterprise Platform of AI (EPAI), providing more efficient, user-friendly and secure end-to-end development tools for corporate LLM training. EPAI offers a full suite of tools covering data preparation, model training, knowledge retrieval, application frameworks, and more, and it supports multi-compute and multi-algorithm capabilities. For corporations, even non-technical corporate users, EPAI can help them efficiently deploy and develop AI applications, unleashing tremendous business value.
Specialist models
While general-purpose models like GPT-3 have captured the public imagination, some companies are developing AI tailored specifically for industrial and scientific needs. This strategy can both address the challenge of computing power shortage and achieve better performance in highly specialized tasks. Huawei, in particular, has invested heavily in its Pangu large language models designed to optimize real-world tasks.
The Pangu series departs from one-size-fits-all general models through industry-specific pre-training. Models fine-tune data from sectors as varied as manufacturing, mining, finance, government, automotive and pharmaceuticals.
At the heart of Pangu’s prowess is a meticulously engineered architecture with a hierarchical design: a foundation of 5 core models supplemented by N industry-specific variants and X task-oriented applications. This adaptable structure enables swift optimization and deployment across diverse real-world scenarios.
The results speak for themselves. In mining, the Pangu Mine Model is boosting cleaned coal yields by 0.2% and driving $20 million in additional revenue across 8 sites in China. For autonomous driving, the Pangu Vehicle Model slashes the time to adapt to new truck designs from 4 months to just 4 person weeks. And in drug discovery, the Pangu Drug Molecule Model is accelerating the identification of lead compounds by an astounding 70%.
But perhaps Pangu’s most jaw-dropping feat lies in weather forecasting. By applying its language understanding to fluid dynamics simulations, Huawei has achieved a 10,000-fold increase in processing speed – a revolution that promises to transform the precision and reliability of weather predictions worldwide.
In developing AI, just in many other areas, Huawei is increasingly playing the role of an innovation hub, which supports research institutions and industrial partners with its world-class technology infrastructure and financing.
As general-purposed models such as ChatGPT captivate consumers, Huawei is quietly empowering enterprises and researchers to push the boundaries of what’s possible. From the factory floor to the laboratory, the future-shaping potential of Pangu is undeniable.
Edge-side Small Models
While many Chinese companies are experimenting with MOEs or specialized models, some others have decided to shrink the parameter size of AI models even further to fit them into smart devices and robots.
The recent ICRA 2024 robotics conference was abuzz with discussions around embodied intelligence, raising a common question: when applying large AI models to consumer robots, should the priority be adapting the model to the device or vice versa?
Over the past year, breakthroughs with compact 6B and 7B models, as well as advancements in MOE training, have significantly expanded the potential for running sophisticated AI on edge devices. Smartphones, tablets, robots, and even automobiles are now viable targets, with both algorithmic and hardware teams eager to push the boundaries.
From the algorithmic standpoint, model compression is crucial. But hardware manufacturers are more concerned with whether the model can seamlessly fit their existing product constraints. There are two key challenges:
Firstly, consumer robots operate on fixed product lifecycles, often taking 6 months to 1.5 years from R&D to market launch. So, despite ChatGPT’s breakthrough over a year ago, today’s commercially available robot vacuums have yet to integrate such large-scale language models.
Secondly, the underlying hardware chips have inherent performance limits – parameters like bandwidth and memory capacity are already baked into the “physical” chip design. This directly determines the AI model size and speed that can be practically deployed.
Consequently, robot makers are grappling with two critical questions:
That’s where the MiniCPM-Llama3-V 2.5 comes in. Developed by Chinese startup ModelBest with a mere 8 billion parameter count, this model’s overall performance punch surpasses that of industry heavyweights like Google’s multimodal Gemini Pro and OpenAI’s GPT-4V.
Boasting state-of-the-art OCR capabilities, the model can accurately recognize lengthy, complex images and texts – with a 9x increase in pixel resolution for unparalleled clarity. Crucially, it combines these robust recognition abilities with powerful reasoning skills.
The model has also achieved a breakthrough on mobile devices. By seamlessly integrating NPU and CPU acceleration frameworks, it delivers a remarkable 150x speed increase for large multimodal models running on smartphones.
Furthermore, the model’s linguistic prowess is truly expansive. Supporting over 30 languages, including not just Chinese and English, but also major European tongues like French, German, and Spanish, it essentially covers the entire gamut of countries along the Belt and Road Initiative.
The combined advantage
As China continues to cement its global manufacturing dominance, the development of compact, high-performance AI models like MiniCPM-Llama3-V 2.5 is set to be a game-changer. These small-footprint models can be seamlessly integrated into a wide range of smart devices – from wearables and drones to electric vehicles and industrial robots.
China’s manufacturing ecosystem, with its robust and comprehensive supply chain, provides an ideal testbed for deploying these advanced AI capabilities. By embedding cutting-edge models directly into the electronic products rolling off Chinese assembly lines, manufacturers can unlock unprecedented levels of functionality and intelligence.
This integration of small, powerful AI models will inevitably lead to the reinvention of familiar product categories. Imagine mix-reality glasses that can fully replace smartphones and tablets, revolutionizing how we interact with digital technologies. Or drones and robots operating in remote areas without internet access, leveraging local AI processing to autonomously navigate and execute complex tasks.
The military domain also stands to benefit tremendously from China’s advancements in small AI models. With the ability to run sophisticated algorithms on low-power, size-constrained hardware, a new generation of intelligent defence systems and unmanned platforms can be developed.
As China continues to strengthen its position as the world’s factory, the rise of compact, capable AI models stands to supercharge its manufacturing might. From consumer electronics to military equipment, this technological fusion is poised to redefine entire industries in the years ahead.
Faced with significant challenges like limited venture capital funding and computing power, the Chinese AI industry has demonstrated remarkable resilience. Strategies like Mixture-of-Experts (MOEs), specialist models, and compact edge-side models have enabled Chinese companies to chart a distinct path of development, dramatically different from the trajectory of American AI.
These innovative approaches, born out of necessity, have allowed Chinese firms to adapt and thrive amidst resource constraints. MOEs, for instance, leverage a modular architecture to maximize the capabilities of available hardware. Specialist models, finely tuned for specific tasks, can deliver high performance without the bloat of large generalist networks. And the rise of edge-side small models has empowered a new generation of intelligent products, overcoming the limitations of cloud-dependent AI. Instead of replicating Silicon Valley, Chinese AI is forging its own identity, one that is pragmatic, efficient and deeply integrated with its manufacturing prowess.