The Blind Hiker — Gradient Descent Playground
Drag a ball on four different loss surfaces. Tune the learning rate. Trigger four built-in failure modes. Watch the loss curve go down — or explode.
Thinking... this might take a while, we're idiots
How Gradient Descent Works
Gradient descent is the optimization algorithm behind almost all of machine learning. It's how models find the best parameters by following the slope downhill.
The Analogy
Imagine you're blindfolded on a hilly landscape and need to find the lowest valley. You can feel the slope under your feet. Gradient descent says: take a step in the direction the ground slopes most steeply downward.
What to Try in the Demo
- Pick a loss function. Convex bowl is the friendly case. Double valley has two minima with one deeper. Wiggly looks smooth from far away but has bumps near the bottom. Cliff has a discontinuity that breaks the smooth-descent assumption.
- Drag the ball to set its starting position. The trail clears each time you drag.
- Step runs one gradient update with animation. Run keeps stepping until convergence, divergence, or 200 steps. Reset snaps back to the function's default starting point.
- Tune the learning rate with the slider. It's log-scaled from 0.001 to 1.0 — the same range you'd see in real training.
- Try the failure-mode presets. Each loads a configuration that demonstrates a specific way training goes wrong: overshoot, glacial progress, local-minimum trap, cliff plummet.
Key Concepts
Learning rate controls step size. Too big = chaos. Too small = glacial. Most of ML engineering is finding the right learning rate schedule.
Local minima are valleys that aren't the deepest. The "Stuck in local minimum" preset shows this directly: the ball lands in the shallow well at x≈25 and never finds the deeper one at x≈75. In high-dimensional spaces (real neural networks), local minima are less of a problem than you'd think — but in 1D they're brutal.
Saddle points are flat regions where the gradient is near zero. The ball slows to a crawl. Momentum-based optimizers (Adam, SGD+momentum) help push through these.
Discontinuities. Real loss landscapes can have cliffs — places where the gradient is near zero on one side and enormous on the other. The "Cliff" function and its preset show what your optimizer does when it samples a discontinuity: the gradient explodes and the ball flies off. This is why gradient clipping exists.
The Math
At each step, we update parameters:
Where is the learning rate and is the gradient of the loss function.
Join the Idiots
New lab every Sunday. No spam, unsubscribe anytime.