Gradient Descent Visualized
Watch a ball roll downhill on a loss surface — drag the starting point to see how optimization really works.
Thinking... this might take a while, we're idiots
How Gradient Descent Works
Gradient descent is the optimization algorithm behind almost all of machine learning. It's how models find the best parameters by following the slope downhill.
The Analogy
Imagine you're blindfolded on a hilly landscape and need to find the lowest valley. You can feel the slope under your feet. Gradient descent says: take a step in the direction the ground slopes most steeply downward.
What to Try in the Demo
- Drag the starting point to different positions and watch the path change
- Try a high learning rate — the ball overshoots and bounces around
- Try a low learning rate — the ball crawls painfully slow
- Find the sweet spot where convergence is fast and stable
Key Concepts
Learning rate controls step size. Too big = chaos. Too small = glacial. Most of ML engineering is finding the right learning rate schedule.
Local minima are valleys that aren't the deepest. The ball can get stuck there. In high-dimensional spaces (real neural networks), local minima are less of a problem than you'd think.
Saddle points are flat regions where the gradient is near zero. The ball slows to a crawl. Momentum-based optimizers (Adam, SGD+momentum) help push through these.
The Math
At each step, we update parameters:
Where is the learning rate and is the gradient of the loss function.
Join the Idiots
New lab every Friday. No spam, unsubscribe anytime.