Understanding Backpropagation

How Backpropagation Works

Backpropagation is how neural networks learn. It's the algorithm that figures out which weights to adjust and by how much after every prediction.

The Core Idea

Forward pass, data flows through the network, producing a prediction
Loss calculation, we measure how wrong the prediction was
Backward pass, we trace back through the network, computing how much each weight contributed to the error
Weight update, each weight gets nudged in the direction that reduces the error

What to Try in the Demo

Click Step Forward to watch data propagate through the network
Click Step Backward to see gradients flow in reverse
Watch how the gradient magnitude changes at each layer, this is the vanishing gradient problem in action
Try different learning rates to see how step size affects convergence

The Chain Rule

The math behind backprop is just the chain rule from calculus, applied repeatedly. For each weight, we compute:

$\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}$

Each term in this chain tells us how sensitive the output is to a small change at that point in the network.

Why This Matters

Every modern AI model, GPT, DALL-E, self-driving cars, learns through backpropagation. Understanding it gives you intuition for why models fail, why training is unstable, and how to fix it.