Level 2 ·●○○
The Dream Machine
Encode, predict, act — how a machine dreams a world forward.
Prerequisites: Level 1.
Level 1 ended with a sentence: a world model predicts what happens next, given what is and what you do. Your brain implements that sentence with a hundred billion neurons and a lifetime of watching. This level is about how machines came to implement it — a forty-year idea told in four acts — and about the three-step mechanism every modern version shares.
The lineage, in one honest arc
No system worship here: each of these mattered for one specific reason, and each has a source you can read.
1991 · Richard Sutton
Dyna
The ancestor of the whole idea. Dyna's agent doesn't just learn from real experience — it also practices on an internal model of the world, interleaving real steps with imagined ones. 'Learning in imagination' starts here, decades before anyone could render a dream.
Dyna, an Integrated Architecture for Learning, Planning, and Reacting — Sutton, SIGART Bulletin, 1991
2018 · Ha & Schmidhuber
World Models
The paper that gave the field its name back. A VAE compresses the game screen into a small latent code, an RNN learns to predict how that code evolves, and a tiny controller is trained entirely inside the model's own dream — then transferred back to the real game, where it drives.
2020 · Schrittwieser et al., DeepMind
MuZero
The counterpoint to reconstruction: MuZero never learns to predict observations at all. Its learned model predicts only what planning needs — value, policy, reward — and with that it mastered Go, chess, shogi, and Atari. Model only what matters.
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) — Schrittwieser et al., Nature, 2020
2023 · Hafner et al.
DreamerV3
The most legible demonstration that learning inside a latent dynamics model works at scale: DreamerV3 collected diamonds in Minecraft — a long, sparse, brutal task — with one set of hyperparameters, training its behavior almost entirely in imagination.
Mastering Diverse Domains through World Models (DreamerV3) — Hafner et al., arXiv:2301.04104, 2023
The mechanism: encode → predict → act
Strip away the eras and the acronyms and the recipe underneath is always the same three verbs.
Encode. The world arrives as too much: a camera frame is a million numbers, most of them about lighting. The model first squeezes the observation into a small code — the latent state — keeping what it will need and throwing the rest away.
Predict. The dynamics live in that small code: given the latent now and an action, predict the latent next. This is the world model proper — the part you'll train in Level 3.
Act. Something reads the predictions and chooses: a planner imagining futures, or a policy trained inside them.
But “keeps what it will need and throws the rest away” is exactly the kind of phrase this site refuses to leave abstract. Here is a bottleneck you can operate yourself.
Prediction is just a function
The second verb, predict, has a mystique it doesn't deserve. Here is a complete world model for the ball, and its entire source code is one line: next position = position + velocity. Step it forward and watch two things with equal attention: how good it is, and exactly where it breaks.
Between bounces the toy is perfect — flight in an empty box really is position-plus-velocity. Then the ball meets a wall, the world does something the function doesn't contain, and the ghost sails straight through. The gap between the ghost and the ball is everything this hand-written model doesn't know. In Level 3 you won't fix that gap by writing a better function — you'll let prediction errors adjust the knobs of a neural network until it discovers the walls on its own. That, one sentence early, is what training a world model means.