The Blind Hiker — Gradient Descent Playground | Lab #2 — ThinkIdiot

How Gradient Descent Works

Gradient descent is the optimization algorithm behind almost all of machine learning. It's how models find the best parameters by following the slope downhill.

The Analogy

Imagine you're blindfolded on a hilly landscape and need to find the lowest valley. You can feel the slope under your feet. Gradient descent says: take a step in the direction the ground slopes most steeply downward.

What to Try in the Demo

Pick a loss function. Convex bowl is the friendly case. Double valley has two minima with one deeper. Wiggly looks smooth from far away but has bumps near the bottom. Cliff has a discontinuity that breaks the smooth-descent assumption.
Drag the ball to set its starting position. The trail clears each time you drag.
Step runs one gradient update with animation. Run keeps stepping until convergence, divergence, or 200 steps. Reset snaps back to the function's default starting point.
Tune the learning rate with the slider. It's log-scaled from 0.001 to 1.0 — the same range you'd see in real training.
Try the failure-mode presets. Each loads a configuration that demonstrates a specific way training goes wrong: overshoot, glacial progress, local-minimum trap, cliff plummet.

Key Concepts

Learning rate controls step size. Too big = chaos. Too small = glacial. Most of ML engineering is finding the right learning rate schedule.

Local minima are valleys that aren't the deepest. The "Stuck in local minimum" preset shows this directly: the ball lands in the shallow well at x≈25 and never finds the deeper one at x≈75. In high-dimensional spaces (real neural networks), local minima are less of a problem than you'd think — but in 1D they're brutal.

Saddle points are flat regions where the gradient is near zero. The ball slows to a crawl. Momentum-based optimizers (Adam, SGD+momentum) help push through these.

Discontinuities. Real loss landscapes can have cliffs — places where the gradient is near zero on one side and enormous on the other. The "Cliff" function and its preset show what your optimizer does when it samples a discontinuity: the gradient explodes and the ball flies off. This is why gradient clipping exists.

The Math

At each step, we update parameters:

$\theta_{t+1} = \theta_t - \alpha \nabla L(\theta_t)$

Where $\alpha$ is the learning rate and $\nabla L$ is the gradient of the loss function.

The Blind Hiker — Gradient Descent Playground

How Gradient Descent Works

The Analogy

What to Try in the Demo

Key Concepts

The Math

Join the Idiots