Level 2 ·●○○

The Dream Machine

Encode, predict, act — how a machine dreams a world forward.

Prerequisites: Level 1.

Level 1 ended with a sentence: a world model predicts what happens next, given what is and what you do. Your brain implements that sentence with a hundred billion neurons and a lifetime of watching. This level is about how machines came to implement it — a forty-year idea told in four acts — and about the three-step mechanism every modern version shares.

The lineage, in one honest arc

No system worship here: each of these mattered for one specific reason, and each has a source you can read.

1991 · Richard Sutton
Dyna
The ancestor of the whole idea. Dyna's agent doesn't just learn from real experience — it also practices on an internal model of the world, interleaving real steps with imagined ones. 'Learning in imagination' starts here, decades before anyone could render a dream.
Dyna, an Integrated Architecture for Learning, Planning, and Reacting — Sutton, SIGART Bulletin, 1991
2018 · Ha & Schmidhuber
World Models
The paper that gave the field its name back. A VAE compresses the game screen into a small latent code, an RNN learns to predict how that code evolves, and a tiny controller is trained entirely inside the model's own dream — then transferred back to the real game, where it drives.
World Models — Ha & Schmidhuber, arXiv:1803.10122, 2018
2020 · Schrittwieser et al., DeepMind
MuZero
The counterpoint to reconstruction: MuZero never learns to predict observations at all. Its learned model predicts only what planning needs — value, policy, reward — and with that it mastered Go, chess, shogi, and Atari. Model only what matters.
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) — Schrittwieser et al., Nature, 2020
2023 · Hafner et al.
DreamerV3
The most legible demonstration that learning inside a latent dynamics model works at scale: DreamerV3 collected diamonds in Minecraft — a long, sparse, brutal task — with one set of hyperparameters, training its behavior almost entirely in imagination.
Mastering Diverse Domains through World Models (DreamerV3) — Hafner et al., arXiv:2301.04104, 2023

The mechanism: encode → predict → act

Strip away the eras and the acronyms and the recipe underneath is always the same three verbs.

Encode. The world arrives as too much: a camera frame is a million numbers, most of them about lighting. The model first squeezes the observation into a small code — the latent state — keeping what it will need and throwing the rest away.

Predict. The dynamics live in that small code: given the latent now and an action, predict the latent next. This is the world model proper — the part you'll train in Level 3.

Act. Something reads the predictions and chooses: a planner imagining futures, or a policy trained inside them.

But “keeps what it will need and throws the rest away” is exactly the kind of phrase this site refuses to leave abstract. Here is a bottleneck you can operate yourself.

The bottleneck — how many numbers is a world?

bottleneck k = 6learning encoder…

This is a real encoder — a principal-component analysis learned from 240 rendered frames when this page loaded (open “view the code”). Slide k down: the ball and paddle positions survive to the very end, because they're what varies most predictably across frames. The drifting checkerboard texture dies almost immediately. A bottleneck forces a model to keep what matters — and what it keeps is called the latent state.

Prediction is just a function

The second verb, predict, has a mystique it doesn't deserve. Here is a complete world model for the ball, and its entire source code is one line: next position = position + velocity. Step it forward and watch two things with equal attention: how good it is, and exactly where it breaks.

A world model in one line of code

Press “step” — the ghost ring is the toy model's guess for the NEXT frame.

Between bounces the toy is perfect — flight in an empty box really is position-plus-velocity. Then the ball meets a wall, the world does something the function doesn't contain, and the ghost sails straight through. The gap between the ghost and the ball is everything this hand-written model doesn't know. In Level 3 you won't fix that gap by writing a better function — you'll let prediction errors adjust the knobs of a neural network until it discovers the walls on its own. That, one sentence early, is what training a world model means.

The lineage, in one honest arc

Dyna

World Models

MuZero

DreamerV3

The mechanism: encode → predict → act

Prediction is just a function