The experiment took place on the open-world gaming platform Minecraft, where up to 1,000 software agents simultaneously used large language models (LLMs) to interact with each other. Without further input from their human creators, they developed a remarkable range of personality traits, preferences, and special roles when prompted by a text. The work of AI startup Altera is part of a broader field aiming to use simulated agents to model how human groups would react to new economic policies or other interventions.
For Altera’s founder, Robert Yang, who left his position as an assistant professor of computational neuroscience at MIT to start the company, the Minecraft demo is just the beginning. He sees it as a first step towards large-scale “AI civilizations” that could coexist and collaborate with us in digital spaces. “The true power of AI will only unfold when we have truly autonomous agents that can collaborate on a large scale,” says Yang.
Yang was inspired by researcher Joon Sung Park from Stanford University, who discovered in 2023 that a group of 25 autonomous AI agents interacting in a simple digital world exhibited surprisingly human-like behaviors. “After his work was published, we started working the next week,” says Yang. “Six months later, I left MIT.”
Yang wanted to push Park’s idea to the limit. “We wanted to explore the boundaries of what agents can do autonomously in groups.” Altera quickly raised over eleven million dollars in funding from investors, including A16Z and former Google CEO Eric Schmidt’s VC firm. Earlier this year, Altera released its first demo: an AI-controlled character in Minecraft that plays with users.
The new experiment by Altera, Project Sid, now uses simulated AI agents equipped with “brains” consisting of multiple modules. Some modules are powered by LLMs and specialize in specific tasks, such as responding to other agents, communicating, or planning the agent’s next move.
The team started small, testing groups of about 50 agents in Minecraft to observe their interactions. Over twelve days in the game (four hours in the real world), they began to show some interesting new behaviors. Some became very sociable and made many connections with other characters, while others appeared more introverted. The “likability” rating of each agent (measured by the agents themselves) changed over time as interactions continued. The agents were able to follow and respond to social cues. In one case, an AI cook tasked with distributing food to the hungry gave more to those he felt appreciated him the most.
In a series of simulations with 30 agents, even more human-like behaviors emerged. Although all agents started with the same personality and overall goal – to create an efficient village and protect the community from attacks by other creatures in the game – they spontaneously and without prompting developed specialized roles within the community. They assumed roles such as builders, defenders, traders, and explorers. Once an agent began to specialize, their actions in the game reflected their new role. For example, an AI artist spent more time picking flowers, farmers collected seeds, and guards built more fences.
“We were surprised to see that the right virtual brains can really show emergent behavior,” says Yang. That’s something you would expect from humans, but not from machines.
Yang’s team also tested whether agents could follow community-wide rules. They introduced a world with basic tax laws and allowed the agents to vote on changes to the tax system in the game. Agents prompted to speak for or against taxes were able to influence the behavior of other agents around them to the extent that they then voted for a tax cut or increase, depending on whom they interacted with.
The team increased the number of agents in each simulation to the maximum the Minecraft server could handle without disruptions, in some cases up to 1,000 at once. In one of the simulations with 500 agents, Altera observed how the agents spontaneously developed cultural memes (such as a penchant for pranks or an interest in environmental issues) and spread them among their fellow agents. The team also deployed a small group of agents attempting to spread the (parodied) religion of Pastafarianism in various cities and rural areas of the game world and observed how these Pastafarian priests converted many of the agents they interacted with. The converts then spread Pastafarianism (the Church of the Flying Spaghetti Monster) in nearby cities of the game world.
The way the agents acted may seem lifelike, but their behavior combines patterns learned by LLMs from human-created data with Altera’s system. This translates the patterns into context-dependent actions, such as picking up a tool or interacting with another agent. “The result is that the LLMs have a sufficiently sophisticated model of human social dynamics to reflect these human behaviors,” says Altera co-founder Andrew Ahn. In other words, the data makes them excellent imitators of human behavior, but they are by no means “alive.”
Yang has even bigger plans. Next, Altera wants to expand its system into Roblox. Yang hopes that eventually, he can go beyond game worlds entirely. His goal is a world where people not only play with AI characters but also interact with them in their daily lives. His dream is to create a large number of “digital humans” who genuinely care about us and collaborate with us to help solve problems and entertain us. “We want to create agents that can truly ‘love’ people – like dogs love people,” he says.
This view – that AI could love us – is quite controversial in the industry, as many experts argue that it is not possible to replicate emotions in machines with current techniques. AI veteran Julian Togelius, who runs the company Modl.ai that tests games, says he likes Altera’s work, particularly because it allows us to study human behavior in a simulation.
But could these simulated agents ever learn to care for us, love us, or develop self-awareness? Togelius doesn’t believe so. “There’s no reason to believe that a neural network running on a GPU experiences anything at all,” he says.
But perhaps AI doesn’t need to truly love us to be useful. “If the question is whether one of these simulated beings could appear to care about us and do so skillfully enough that it would have the same value to someone as being cared for by a human, then that might not be impossible,” Togelius adds. “You could create a simulation of care that is good enough to be useful. The question is whether it would matter to the person being cared for that the caregiver has no experience.” In short, as long as our AI characters convincingly appear to care about us, that might be all we really need to care about.