Tech
DeepMind introduces AI agent that learns to complete various tasks in a scalable world model
Over the past decade, deep learning has transformed how artificial intelligence (AI) agents perceive and act in digital environments, allowing them to master board games, control simulated robots and reliably tackle various other tasks. Yet most of these systems still depend on enormous amounts of direct experience—millions of trial-and-error interactions—to achieve even modest competence.
This brute-force approach limits their usefulness in the physical world, where such experimentation would be slow, costly, or unsafe.
To overcome these limitations, researchers have turned to world models—simulated environments where agents can safely practice and learn.
These world models aim to capture not just the visuals of a world, but the underlying dynamics: how objects move, collide, and respond to actions. However, while simple games like Atari and Go have served as effective testbeds, world models still fall short when it comes to representing the rich, open-ended physics of complex worlds like Minecraft or robotics environments.
Researchers at Google DeepMind recently developed Dreamer 4, a new artificial agent capable of learning complex behaviors entirely within a scalable world model, given a limited set of pre-recorded videos.
The new model, presented in a paper published on the arXiv preprint server, was the first artificial intelligence (AI) agent to obtain diamonds in Minecraft without practicing in the actual game at all. This remarkable achievement highlights the possibility of using Dreamer 4 to train successful AI agents purely in imagination—with important implications for the future of robotics.
“We as humans choose actions based on a deep understanding of the world and anticipate potential outcomes in advance,” Danijar Hafner, first author of the paper, told Tech Xplore.
“This ability requires an internal model of the world and allows us to solve new problems very quickly. In contrast, previous AI agents usually learn through brute-force with vast amounts of trial-and-error. But that’s infeasible for applications such as physical robots that can easily break.”
Some of the AI agents developed at DeepMind over the past few years have already achieved tremendous success at games such as Go and Atari by training in small world models. However, the world models that these models relied on failed to capture the rich physical interactions in more complex worlds, such as the Minecraft videogame.
On the other hand, “Video models such as Veo and Sora are rapidly improving towards generating realistic videos of very diverse situations,” said Hafner.
“However, they are not interactive, and their generations are too slow, so they cannot be used as ‘neural simulators’ to train agents inside of yet. The goal of Dreamer 4 was to train successful agents purely inside of world models that can realistically simulate complex worlds.”
Hafner and his colleagues decided to use Minecraft as a test bed for their AI agent, as it is a complex video game that contains infinite generated worlds and long-horizon tasks that require over 20,000 consecutive mouse/keyboard actions to be completed.
One of these tasks is the mining of diamonds, which requires the agent to perform a long sequence of prerequisites such as chopping trees, crafting tools, and mining and smelting ores.
Notably, the researchers wanted to train their agent purely in “imagined” scenarios, instead of allowing it to practice in the actual game, analogous to how smart robots will have to learn in simulation, because they could easily break when practicing directly in the physical world . This requires the model to learn object interactions in an accurate enough internal model of the Minecraft world.
The artificial agent developed by Hafner and his colleagues is based on a large transformer model that was trained to predict future observations, actions and the rewards associated with specific situations. Dreamer 4 was trained on a fixed offline dataset containing recorded Minecraft gameplay videos collected by human players.
“After completing this training, Dreamer 4 learns to select increasingly better actions in a wide range of imagined scenarios via reinforcement learning,” said Hafner.
“Training agents inside of scalable world models required pushing the frontier of generative AI. We designed an efficient transformer architecture, and a novel training objective named shortcut forcing. These advances enabled accurate predictions while also speeding up generations by over 25x compared to typical video models.”
Dreamer 4 is the first AI agent to obtain diamonds in Minecraft when trained solely on offline data, without ever practicing its skills in the actual game. This finding highlights the agent’s ability to autonomously learn how to correctly solve complex and long-horizon tasks.
“Learning purely offline is highly relevant for training robots that can easily break when practicing in the physical world,” said Hafner. “Our work introduces a promising new approach to building smart robots that do household chores and factory tasks.”
In the initial tests performed by the researchers, the Dreamer 4 agent was found to accurately predict various object interactions and game mechanics, thus developing a reliable internal world model. The world model established by the agent outperformed the models that earlier agents relied on by a significant margin.
“The model supports real-time interactions on a single GPU, making it easy for human players to explore its dream world and test its capabilities,” said Hafner. “We find that the model accurately predicts the dynamics of mining and placing blocks, crafting simple items, and even using doors, chests, and boats.”
A further advantage of Dreamer 4 is that it achieved remarkable results despite being trained on a very small amount of action data. This is essentially video footage showing the effects of pressing different keys and mouse buttons within the Minecraft videogame.
“Instead of requiring thousands of hours of gameplay recordings with actions, the world model can actually learn the majority of its knowledge from video alone,” said Hafner.
“With only a few hundred hours of action data, the world model then understands the effects of mouse movement and key presses in a general way that transfers to new situations. This is exciting because robot data is slow to record, but the internet contains a lot of videos of humans interacting with the world, from which Dreamer 4 could learn in the future.”
This recent work by Hafner and his colleagues at DeepMind could contribute to the advancement of robotics systems, simplifying the training of the algorithms that allow them to reliably complete manual tasks in the real world.
Meanwhile, the researchers plan to further improve Dreamer 4’s world model, by integrating a long-term memory component. This would ensure that the simulated worlds in which the agent is trained remain consistent over long periods of time.
“Incorporating language understanding would also bring us closer towards agents that collaborate with humans and perform tasks for them,” added Hafner.
“Finally, training the world model on general internet videos would equip the agent with common sense knowledge of the physical world and allow us to train robots in diverse imagined scenarios.”
Written for you by our author Ingrid Fadelli, edited by Sadie Harley, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.
More information:
Danijar Hafner et al, Training Agents Inside of Scalable World Models, arXiv (2025). DOI: 10.48550/arxiv.2509.24527
© 2025 Science X Network
Citation:
DeepMind introduces AI agent that learns to complete various tasks in a scalable world model (2025, October 25)
retrieved 25 October 2025
from https://techxplore.com/news/2025-10-deepmind-ai-agent-tasks-scalable.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Tech
Anthropic Supply-Chain-Risk Designation Halted by Judge
Anthropic won a preliminary injunction barring the US Department of Defense from labeling it a supply-chain risk, potentially clearing the way for customers to resume working with the company. The ruling on Thursday by Rita Lin, a federal district judge in San Francisco, is a symbolic setback for the Pentagon and a significant boost for the generative AI company as it tries to preserve its business and reputation.
“Defendants’ designation of Anthropic as a ‘supply chain risk’ is likely both contrary to law and arbitrary and capricious,” Lin wrote in justifying the temporary relief. “The Department of War provides no legitimate basis to infer from Anthropic’s forthright insistence on usage restrictions that it might become a saboteur.”
Anthropic and the Pentagon did not immediately respond to requests to comment on the ruling.
The Department of Defense, which under Trump calls itself the Department of War, has relied on Anthropic’s Claude AI tools for writing sensitive documents and analyzing classified data over the past couple of years. But this month, it began pulling the plug on Claude after determining that Anthropic could not be trusted. Pentagon officials cited numerous instances in which Anthropic allegedly placed or sought to put usage restrictions on its technology that the Trump administration found unnecessary.
The administration ultimately issued several directives, including designating the company a supply-chain risk, which have had the effect of slowly halting Claude usage across the federal government and hurting Anthropic’s sales and public reputation. The company filed two lawsuits challenging the sanctions as unconstitutional. In a hearing on Tuesday, Lin said the government had appeared to illegally “cripple” and “punish” Anthropic.
Lin’s ruling on Thursday “restores the status quo” to February 27, before the directives were issued. “It does not bar any defendant from taking any lawful action that would have been available to it” on that date, she wrote. “For example, this order does not require the Department of War to use Anthropic’s products or services and does not prevent the Department of War from transitioning to other artificial intelligence providers, so long as those actions are consistent with applicable regulations, statutes, and constitutional provisions.”
The ruling suggests the Pentagon and other federal agencies are still free to cancel deals with Anthropic and ask contractors that integrate Claude into their own tools to stop doing so, but without citing the supply-chain-risk designation as the basis.
The immediate impact is unclear because Lin’s order won’t take effect for a week. And a federal appeals court in Washington, DC, has yet to rule on the second lawsuit Anthropic filed, which focuses on a different law under which the company was also barred from providing software to the military.
But Anthropic could use Lin’s ruling to demonstrate to some customers concerned about working with an industry pariah that the law may be on its side in the long run. Lin has not set a schedule to make a final ruling.
Tech
How Trump’s Plot to Grab Iran’s Nuclear Fuel Would Actually Work
President Donald Trump and top defense officials are reportedly weighing whether to send ground troops to Iran in order to retrieve the country’s highly enriched uranium. However, the administration has shared little information about which troops would be deployed, how they would retrieve the nuclear material, or where the material would go next.
“People are going to have to go and get it,” secretary of state Marco Rubio said at a congressional briefing earlier this month, referring to the possible operation.
There are some indications that an operation is close on the horizon. On Tuesday, The Wall Street Journal reported that the Pentagon has imminent plans to deploy 3,000 brigade combat troops to the Middle East. (At the time of writing, the order has not been made.) The troops would come from the Army’s 82nd Airborne Division, which specializes in “joint forcible entry operations.” On Wednesday, Iran’s government rejected Trump’s 15-point plan to end the war, and White House press secretary Karoline Leavitt said that the president “is prepared to unleash hell” in Iran if a peace deal is not reached—a plan some lawmakers have reportedly expressed concern about.
Drawing from publicly available intelligence and their own experience, two experts outlined the likely contours of a ground operation targeting nuclear sites. They tell WIRED that any version of a ground operation would be incredibly complicated and pose a huge risk to the lives of American troops.
“I personally think a ground operation using special forces supported by a larger force is extremely, extremely risky and ultimately infeasible,” Spencer Faragasso, a senior research fellow at the Institute for Science and International Security, tells WIRED.
Nuclear Ambitions
Any version of the operation would likely take several weeks and involve simultaneous actions at multiple target locations that aren’t in close proximity to each other, the experts say. Jonathan Hackett, a former operations specialist for the Marines and the Defense Intelligence Agency, tells WIRED that as many as 10 locations could be targeted: the Isfahan, Arak, and Darkhovin research reactors; the Natanz, Fordow, and Parchin enrichment facilities; the Saghand, Chine, and Yazd mines; and the Bushehr power plant.
According to the International Atomic Energy Agency, Isfahan likely has the majority of the country’s 60 percent highly enriched uranium, which may be able to support a self-sustaining nuclear chain reaction, though weapon-grade material generally consists of 90 percent enriched uranium. Hackett says that the other two enrichment facilities may also have 60 percent highly enriched uranium, and that the power plant and all three research reactors may have 20 percent enriched uranium. Faragasso emphasizes that any such supplies deserve careful attention.
Hackett says that eight of the 10 sites—with the exception of Isfahan, which is likely intact underground, and “Pickaxe Mountain,” a relatively new enrichment facility near Natanz—were mostly or partially buried after last June’s air raids. Just before the war, Faragasso says, Iran backfilled the tunnel entrances to the Isfahan facility with dirt.
The riskiest version of a ground operation would involve American troops physically retrieving nuclear material. Hackett says that this material would be stored in the form of uranium hexafluoride gas inside “large cement vats.” Faragasso adds that it’s unclear how many of these vats may have been broken or damaged. At damaged sites, troops would have to bring excavators and heavy equipment capable of moving immense amounts of dirt to retrieve them
A comparatively less risky version of the operation would still necessitate ground troops, according to Hackett. However, it would primarily use air strikes to entomb nuclear material inside of their facilities. Ensuring that nuclear material is inaccessible in the short to medium term, Faragasso says, would entail destroying the entrances to underground facilities and ideally collapsing the facilities’ underground roofs.
Softening the Area
Hackett tells WIRED that based on his experience and all publicly available information, Trump’s negotiations with Iran are “probably a ruse” that buys time to move troops into place.
Hackett says that an operation would most likely begin with aerial bombardments in the areas surrounding the target sites. These bombers, he says, would likely be from the 82nd Airborne Division or the 11th or 31st Marine Expeditionary Units (MEU). The 11th MEU, a “rapid-response” force, and the 31st MEU, the only Marine unit continuously deployed abroad in strategic areas, have reportedly both been deployed to the Middle East.
Tech
Amazon’s Spring Sale Is So-So, but Cadence Capsules Are a Bright Spot
The WIRED Reviews Team has been covering Amazon’s Big Spring Sale since it began at on Wednesday, and the overall deals have been … not great, honestly. So far, we’ve found decent markdowns on vacuums, smart bird feeders, and even an air fryer we love, but I just saw that Cadence Capsules, those colorful magnetic containers you may have seen on your social media pages, are 20 percent off. (For reference, the last time I saw them on sale, they were a measly 9 percent off.)
If you’re not familiar, they allow you to decant your full-sized personal care products you use at home—from shampoo and sunscreen to serums and pills—into a labeled, modular system of hexagonal containers that are leak-proof, dishwasher safe, and stick together magnetically in your bag or on a countertop. No more jumbled, travel-sized toiletries and leaky, mismatched bottles and tubes.
Cadence Capsules have garnered some grumbling online for being overly heavy or leaking, but I’ve been using them regularly for about a year—I discuss decanting your daily-use products in my guide to How to Pack Your Beauty Routine for Travel—and haven’t experienced any leaks. They do add weight if you’re trying to travel super-light, and because they’re magnetic, they will also stick to other metal items in your toiletry bag, like bobby pins or other hair accessories. This can be annoying, especially if you’re already feeling chaotic or in a hurry.
Otherwise, Capsules are modular, convenient, and make you feel supremely organized—magnetic, interchangeable inserts for the lids come with permanent labels like “shampoo,” “conditioner,” “cleanser,” and “moisturizer.” Maybe you love this; maybe you don’t. But at least if you buy on Amazon, you can choose which label genre you get (Haircare, Bodycare, Skincare, Daily Routine). If this just isn’t your jam, the Cadence website offers a set of seven that allows you to customize the color and lid label of each Capsule, but that set is not currently on sale.
-
Fashion1 week agoSales at US apparel, clothing accessories stores up 4% YoY in Jan 2026
-
Entertainment1 week agoVal Kilmer revived 1 year after death through AI
-
Fashion1 week agoUS’ G-III Apparel’s FY26 sales fall 7% to $2.96 bn
-
Business1 week agoBrits cashing in jewellery as gold price hits record high
-
Sports1 week agoMarch Madness 2026 – How to watch in SA, start time, schedule, TV channel for NCAA championship basketball tournament
-
Fashion6 days agoChina’s textile & apparel exports surge 17% to $50 bn in Jan-Feb 2026
-
Business1 week agoVideo: The Effects of High Oil Prices
-
Business6 days agoFlipkart group CFO to leave co amid IPO plans – The Times of India

