Level 1 ·●○○
You Already Have One
Catching a ball is a prediction problem your brain already solves.
Prerequisites: None.
Someone throws you a ball, and you catch it. Slow that down: the ball is in the air for under a second. Your eyes deliver a handful of blurry snapshots — and from those, your brain works out where the ball will be, routes your hand there ahead of time, and closes your fingers before your conscious mind has finished saying the word “ball.” Nobody taught you the equations. You never solved for gravity. You just watched the world for a few years until something inside you could run it forward.
That something has a name in AI research, and this whole site is about it. But you shouldn't take a definition on faith when you can catch it in the act instead.
Prove it: the occlusion game
Below, a ball rolls across a court and disappears behind a wall. While it's hidden, nothing on your screen knows where it is — you do. Click on the dashed exit line at the spot, and at the moment, you think it will re-emerge.
However you scored, notice what you were doing between the wall's edges: you kept a little ball flying in your head. You gave it a speed, bounced it off the same invisible floor and ceiling, and read off your answer when your inner ball reached the line. In the terms this field uses: you initialized a model from observations, ran it forward, and acted on its prediction. You just ran a world model.
The loop, gently
To say precisely what a world model is, we need only four words and one circle. An agent is anything that acts: you, a robot, a piece of software. The environment is everything the agent doesn't control: the rest of the world. The agent sends actions out; the environment sends observations back; around and around, forever. Hover the arrows — and change the protagonist — to feel how universal the loop is.
The definition lands
Here it is, the sentence the next four levels unpack: a world model predicts what happens next, given what is and what you do. Feed it the current situation and a candidate action; it returns the situation that follows. That's the whole contract. Your brain honors it when it catches a ball. In Level 3 you'll train a neural network to honor it for a small world that fits in your browser.
One way to draw the line, from Fei-Fei Li and World Labs: large language models capture the statistical structure of text, while world models aim at the statistical structure of space and time — what happens next in a physical scene, not what word comes next in a sentence.
Source: A Functional Taxonomy of World Models — Fei-Fei Li & World Labs, Substack, June 3, 2026