Level 5 ·●○○

The 2026 Landscape

Renderers, simulators, planners — who is building what, sourced.

Prerequisites: None to skim; far richer after Levels 1–4.

You've trained a world model, watched it dream, watched it fail, and learned why. Now — and only now — the systems with the press releases. This level is a field guide, and it obeys the house rule harder than any other page: every claim below lives in a JSON file, carries a source id, and a script fails the build if one doesn't. No claim here comes from vibes.

Three jobs, not one thing

Li's functional taxonomy sorts world models by what they do rather than how they're built: renderers produce what a viewer would see, simulators predict how a world evolves as things act in it, and planners use those predictions to choose actions.

Li's test case is a drone shot gliding through a canyon: a video model can produce the footage — every frame plausible, the whole thing beautiful — without anything inside it knowing where the canyon walls are, or what would happen if the drone banked left. It rendered the view; it did not simulate the world.

You already own one of each. Your Level 3 network is a tiny simulator; the MPC that imagined twenty futures per move is a planner; and the canvas that painted the dream so you could watch it is a humble renderer. The billion-dollar versions below split along exactly these lines.

Source: A Functional Taxonomy of World Models — Fei-Fei Li & World Labs, Substack, June 3, 2026

The explorer

Nine entries, three filters, one honest limitation each. Try filtering by action-conditioned first — it is, per one survey's framing, the most clarifying single column in the whole field: can what you do change what it predicts?

Framings: World Models, From Zero to Hero — HackMD, 2026 · World Models, Architectures, and the Next Phase of AI — Ken Huang, Substack, May 2026

function

substrate

action-conditioned?

9 of 9 systems shown · data: content/landscape.json · last updated 2026-07-02

Genie 3
action-conditioned
Google DeepMind · 2025
renderersimulatorpixels / video
Real-time interactive world generation: playable generated worlds at roughly 24 frames per second at 720p, holding consistent over minutes rather than seconds.
For: Interactive environments generated on demand — a step toward worlds you can act in that never existed as assets.
Honest limitation: Closed: available only as a limited API as of 2026 — you can read about it far more easily than you can touch it.
Genie 3: A new frontier for world models — Google DeepMind blog, 2025
Sora / Veo-class video models
not action-conditioned
Various labs · 2024–2026
rendererpixels / video
Text-to-video generators — the 'video generation' camp of the world-model debate. They produce strikingly plausible footage of worlds.
For: Producing views of imagined scenes; the open question is whether that footage implies any inner model of the scene at all.
Honest limitation: The renderer critique applies in full: they produce what a viewer would see, not what is — and your actions can't change what happens next.
A Functional Taxonomy of World Models — Fei-Fei Li & World Labs, Substack, June 3, 2026 · World Models, From Zero to Hero — HackMD, 2026
Marble
not action-conditioned
World Labs · Launched commercially November 2025
renderer3D scenes
Persistent 3D scene generation from text or images, exporting Gaussian splats and meshes — the flagship of the spatial-intelligence camp.
For: Making places: coherent, revisitable 3D scenes you can move a camera through and export into standard pipelines.
Honest limitation: It generates persistent scenes to look at and export — not an action-conditioned simulator of things happening in them.
Marble — World Labs, launched commercially November 2025 · World Models, From Zero to Hero — HackMD, 2026
Cosmos
action-conditioned
NVIDIA · 2025
simulatorpixels / video
A world foundation model platform for 'physical AI' — open and self-hostable, aimed at robotics and autonomous-vehicle development.
For: Infrastructure: a pretrained world model other teams fine-tune for their robots and vehicles, plus synthetic data generation.
Honest limitation: NVIDIA's own report acknowledges failures of object permanence and violations of gravity in generated worlds.
Cosmos World Foundation Model Platform for Physical AI — NVIDIA, arXiv:2501.03575, 2025
Dreamer 4
action-conditioned
Hafner et al. · 2025
simulatorplannerlatent space
The current generation of the Dreamer line: agents optimizing their behavior inside a scalable learned simulator.
For: The agent-centric recipe at scale — learn the world, then train the policy in imagination instead of by expensive real trial-and-error.
Honest limitation: The world model exists to serve the agent's task; it is not a general-purpose world you can wander.
Training Agents Inside of Scalable World Models (Dreamer 4) — Hafner et al., 2025
V-JEPA 2
action-conditioned
Meta AI · 2025
simulatorplannerlatent space
Self-supervised video models that predict in representation space rather than pixels; the action-conditioned V-JEPA 2-AC variant plans for robot manipulation.
For: The latent-prediction bet: understanding, prediction, and planning without ever paying the cost of reconstructing pixels.
Honest limitation: Its imagination is a trajectory of abstract representations — there is no video of the dream to watch.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning — Meta AI, arXiv:2506.09985, 2025
GAIA-2 / GAIA-3
action-conditioned
Wayve · 2024–2026
renderersimulatorpixels / video
Generative world models for driving: synthesizing realistic driving scenarios, positioned as offline evaluation infrastructure.
For: Testing driving policies against rare and dangerous situations in generated worlds instead of waiting to meet them on real roads.
Honest limitation: Positioned for offline evaluation — infrastructure for testing drivers, not the driver itself.
GAIA-2 / GAIA-3 generative world models for driving — Wayve, 2024–2026
Qwen-AgentWorld
action-conditioned
Qwen Team · June 24, 2026
simulatorlanguage
A native language world model: it simulates seven agent domains (MCP tools, search, terminal, software engineering, web, OS, Android) by predicting the next environment observation an agent will receive.
For: A decoupled simulator for agentic reinforcement learning — agents practice against the model instead of live systems — with AgentWorldBench to measure it.
Honest limitation: Its 'world' is the text an environment prints back; it simulates the interface an agent sees, not the machinery underneath.
Qwen-AgentWorld: Language World Models for General Agents — Qwen Team, June 24, 2026
DreamZero
action-conditioned
NVIDIA GEAR · 2026
simulatorplannerpixels / video
A World Action Model on a video-diffusion backbone that jointly predicts future world states and actions — the world model is the policy.
For: Zero-shot robot control: real-time closed-loop action at roughly 150 ms per action chunk, with reported 2× generalization over vision-language-action baselines.
Honest limitation: The headline numbers are the lab's own reported experiments; independent, comparable evaluation is exactly what the field still lacks.
DreamZero: World Action Models are Zero-shot Policies — NVIDIA GEAR, 2026 · Beyond the Video Hype: Why World Models Feel Different in 2026 — Graison Thomas, Medium, April 2026

Featured: the world model that speaks

Every system above simulates space: pixels, scenes, roads, arms. Qwen-AgentWorld is the twist — a world model whose 'world' is the terminal, the browser, the operating system. When a software agent runs a command, something has to play the role of reality and answer it. Qwen-AgentWorld learns to be that reality: given the agent's action, it predicts the observation the environment will return.

The reason is the same as everywhere else in this story: practicing against the real thing is slow, expensive, and sometimes destructive. A learned simulator of the digital world lets agents train against imagined terminals and websites — decoupled from live systems — before touching real ones.

Source: Qwen-AgentWorld: Language World Models for General Agents — Qwen Team, June 24, 2026 · June 24, 2026

It sounds abstract until you sit in the model's chair. So sit in it. Below, an agent works on a bug — and you are the environment it acts on.

You are the world model

predictions right: 0/6

# An agent is fixing a bug in a small repo.

# The terminal's replies are hidden — YOU must predict them.

$ ls▌

What does the environment print back? (1/6)

Why any of this matters

Four honest paragraphs — what world models are actually for, each with its receipts.

Agents that practice in imagination
The Dreamer line trains behavior inside its own learned simulator, and Qwen-AgentWorld extends the same move to software agents practicing against an imagined terminal, browser, and OS — reinforcement learning decoupled from live environments.
Mastering Diverse Domains through World Models (DreamerV3) — Hafner et al., arXiv:2301.04104, 2023 · Training Agents Inside of Scalable World Models (Dreamer 4) — Hafner et al., 2025 · Qwen-AgentWorld: Language World Models for General Agents — Qwen Team, June 24, 2026
Robots
Real robot trial-and-error is slow and breaks things. Cosmos positions world foundation models as infrastructure for physical AI; V-JEPA 2's action-conditioned variant plans robot manipulation in representation space; DreamZero turns a world action model directly into a zero-shot policy at real-time rates.
Cosmos World Foundation Model Platform for Physical AI — NVIDIA, arXiv:2501.03575, 2025 · V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning — Meta AI, arXiv:2506.09985, 2025 · DreamZero: World Action Models are Zero-shot Policies — NVIDIA GEAR, 2026
Synthetic data
When real data is scarce or dangerous to collect, a world model becomes a data factory: Cosmos is explicitly pitched as a platform for generating training worlds for robotics and autonomous vehicles.
Cosmos World Foundation Model Platform for Physical AI — NVIDIA, arXiv:2501.03575, 2025
Driving
Wayve's GAIA models generate driving scenarios as offline evaluation infrastructure — a way to test a driving policy against the rare, dangerous long tail without waiting for it to happen on a real road.
GAIA-2 / GAIA-3 generative world models for driving — Wayve, 2024–2026

The temperature of the race

The scale signals, from secondary reporting: Yann LeCun's AMI Labs raised €500M at a €3B valuation to pursue world-model-centric AI; Genie 3 shipped; Marble launched commercially; Cosmos passed two million downloads. Read these as market temperature, not ground truth — they come from a single roundup.

Source: World Models Race 2026 — Introl blog, January 2026

Closing: you started by catching a ball

The model in your browser has five thousand weights. The systems above have billions, training runs that cost more than buildings, and teams of hundreds. It would be easy to say they're different kinds of thing. They are not. Encode what is; predict what happens next, given what you do; act on the prediction; drift, and be corrected by reality. You watched every link of that chain run in a tab, on code short enough to read with your coffee — the difference is scale, not kind.

And the loop closes further back than Level 3. The first world model in this story was never the network — it was you, catching a ball.

Three jobs, not one thing

The explorer

Genie 3

Sora / Veo-class video models

Marble

Cosmos

Dreamer 4

V-JEPA 2

GAIA-2 / GAIA-3

Qwen-AgentWorld

DreamZero

Featured: the world model that speaks

Why any of this matters

Agents that practice in imagination

Robots

Synthetic data

Driving

The temperature of the race

Closing: you started by catching a ball