Chinese Researchers Develop a Generalist AI Agent for Minecraft
In current AI research, there is a growing interest in creating AI intelligences with generalist capabilities. These intelligences are expected to master a wide range of skills, adapt to various environmental changes, and more accurately simulate and respond to human abilities in complex problems. The research on AGI (Artificial General Intelligence) could lead to significant breakthroughs and advances in industries such as robotics and autonomous driving, ultimately driving greater development in the industrial implementation of AI technologies.
Do you enjoy venturing through Minecraft’s vast, varied landscapes, mining resources, crafting tools, and constructing structures? Have you ever longed for a companion to join you in your escapades? A team of Chinese researchers from Tsinghua University and the Chinese Academy of Science has developed a new AI agent capable of accompanying you on your quests and more. They call it Ghost in Minecraft (GITM), and it’s not your average bot.
Minecraft presents AI agents with a fascinating conundrum known as Moravec’s Paradox:
Tasks that are challenging for humans, such as chess, are relatively simple for AI, whereas tasks typically easy for humans, like interacting with and making decisions in an open-world environment like Minecraft prove to be monumental challenges for AI.
GITM has successfully overcome this paradox, achieving a significant breakthrough in a complex, lifelike setting. This greatly contributes to the potential advancements in AI technology and the development of more general AI agents.
GITM is a Generally Capable Agent (GCA), meaning it can tackle any task without requiring specific training or instructions. It achieves this capability using a Large Language Model (LLM), which essentially means it possesses an extensive understanding of words and common sense. GITM communicates with you through natural language and comprehends your desired outcomes. Moreover, it uses text-based knowledge and memory to store and retrieve useful information such as recipes, locations, and goals.
GITM achieved 100% mission coverage on all technical challenges in the main world within Minecraft (successfully unlocking the full tech tree), whereas previously all intelligences combined could only cover 30%.
The researchers tested GITM on a popular task in Minecraft called “ObtainDiamond,” where the agent has to find and mine a diamond. This is not an easy task because diamonds are rare and deep underground, and the agent has to survive various dangers along the way, such as lava, monsters, and hunger. Previous methods using reinforcement learning (RL) could only achieve a success rate of around 20%, which means they failed 4 out of 5 times. GITM, on the other hand, achieved a whopping 67.5% success rate, which means it succeeded 2 out of 3 times. That’s a huge improvement!
But GITM is not satisfied with just one diamond. It wants them all. And not just diamonds, but everything else in the Minecraft Overworld. That’s right, GITM can procure all items in the game, from wood to nephrite. It can craft tools, weapons, and armor, build shelters and farms, and even tame animals. It can do anything you can do, and probably better.
How does GITM achieve all this? It uses an ingenious approach. Previous AI agents struggled to connect intricate goals in Minecraft with the precise mouse and keyboard actions required for completion. GITM, however, employs the LLM to generate action plans based on the current situation and desired objective. The LLM comprises three components:
For example, if GITM wants to make a nether portal, it can ask the LLM how to do it, and the LLM will tell it something like this:
GITM can then execute these steps one by one, using its text-based memory to keep track of what it has done and what it needs to do next. It can also use its text-based knowledge to look up any information it needs, such as recipes, locations, and properties of items. For example, if GITM wants to know where to find obsidian, it can ask the LLM, and the LLM will tell it something like this:
GITM can then use this information to guide its exploration and mining activities. Remarkably, GITM does not require a GPU for training; a single CPU node with 32 CPU cores is sufficient. This means that a decent laptop is all you need to run GITM, improving efficiency by at least 10,000 times compared to OpenAI’s VPT and DeepMind’s DreamerV3. You can even download the code from their GitHub page and try it out for yourself.
The meaning of GITM goes beyond computer games. It represents the latest version of Generally Capable Agent that can explore and interact with the environment (albeit virtual in this case) and, formulate strategies, then execute them all by itself. Imagine that one day, such GCA could enter into our physical world through sensors and smart devices, solving real-world problems for us in an autonomous fashion. Let’s hold our breath and look forward for more to come.