Glossary
Every term, in plain language — and a link to the place on this site where you can watch it happen instead of reading about it.
- world model
- A system that predicts what happens next, given what is and what you do. Your brain runs one when it catches a ball; Level 3's neural network is one you trained yourself.
- See it live: Level 1 — where the definition lands
- agent
- Anything that acts: a person, a robot, a piece of software. The half of the loop that chooses.
- See it live: the agent loop in Level 1
- environment
- Everything the agent doesn't control — the rest of the world, which answers every action with an observation.
- See it live: the agent loop in Level 1
- state
- Every fact about the world that matters for what happens next. In Bounce it's five numbers: ball position, ball velocity, paddle position. Usually hidden; you see observations instead.
- See it live: observation vs state in Level 1's Go deeper
- observation
- The glimpse of the world an agent actually receives — pixels, sensor readings, printed text. Partial, sometimes noisy, never the state itself.
- See it live: the occlusion game (where the observation loses the ball)
- action
- What the agent does to the world. In Bounce: move the paddle left, stay, or right. 'Action-conditioned' models let your action change their prediction.
- See it live: the action-conditioned filter in Level 5
- latent state / latent space
- The small code a model keeps after squeezing an observation through a bottleneck — what survives compression. Positions survive; texture dies.
- See it live: the compression toy in Level 2
- dynamics (model)
- The function that carries state forward in time: (state, action) → next state. Real physics has one; a world model learns an imitation of it.
- See it live: training the dynamics network in Level 3
- experience buffer
- The dataset gameplay writes: one (state, action, next state) row per timestep. Experience is training data, literally.
- See it live: Stage 1 of Level 3
- rollout (dream)
- Running a model forward many steps on its own predictions — its step 2 computed from its step 1. Also called imagination, or a dream.
- See it live: Stage 3 of Level 3
- compounding error
- The signature failure of free-running rollouts: each step's small mistake becomes the next step's input, so error grows with horizon — linearly at best, exponentially near sensitive events like bounces.
- See it live: the failure lab in Level 4
- MPC (model-predictive control)
- Plan by imagining: simulate many candidate action sequences through a model, take the first action of the best one, replan constantly. Random-shooting MPC samples the candidates at random.
- See it live: Stage 4 of Level 3
- model-based reinforcement learning
- The family of methods that learn a model of the world and use it — for planning or for training a policy on imagined experience — instead of learning from real trial-and-error alone. Sample-efficient because imagined practice is cheap.
- See it live: Level 3's Go deeper
- policy
- A learned reflex: a function straight from state to action, no planning at decision time. The alternative (and complement) to MPC.
- See it live: MPC vs learned policies in Level 3's Go deeper
- renderer
- A world model whose job is producing views — what a camera or an eye would see. Renders appearances; need not track consequences.
- See it live: the taxonomy in Level 5
- simulator
- A world model whose job is evolving a world: given state and action, produce what happens next. The job your Level 3 network does for Bounce.
- See it live: the taxonomy in Level 5
- planner
- A world model (or its user) whose job is choosing actions by evaluating predicted futures. Your Stage 4 paddle was driven by one.
- See it live: the taxonomy in Level 5
- POMDP
- Partially Observable Markov Decision Process — the formal frame for the whole story: hidden state, observations, actions, transitions. The reason inner models are necessary at all.
- See it live: Level 1's Go deeper
- teacher forcing
- Training a predictor only from real states, never from its own outputs. Great for learning; quietly unprepares the model for free-running dreams, where inputs are its own slightly-wrong predictions.
- See it live: Level 4's Go deeper
- divergence
- This site's measure of how far a dream has drifted: the distance between the real and imagined ball positions after the same actions from the same start.
- See it live: the divergence meter in Level 3