Back to Blog
Chief Idiot2 min read

ML 8: The Memorizer (Overfitting)

Why studying by rote memorization fails the test.

The Smartest Idiot

Imagine a student, Steve. Steve wants to pass the math exam. Instead of learning the formulas, Steve memorizes the exact answers to the practice questions.

practice_Q1: "2 + 2 = ?" -> Steve knows "4". practice_Q2: "3 x 5 = ?" -> Steve knows "15".

Steve gets 100% on the Practice Test (Training Data).

But on the Real Exam (Test Data): Real_Q1: "2 + 3 = ?" Steve panics. He has never seen this. He guesses "4" because it looks like 2+2.

Steve has Overfit the data.

The Memorizer

Underfitting vs Overfitting

  1. Underfitting (The Slacker):

    • Didn't study.
    • Fails Practice Test. Fails Real Exam.
    • Model is too simple (e.g., trying to draw a straight line through a circle).
  2. Overfitting (The Memorizer):

    • Memorized the noise.
    • Aces Practice Test. Fails Real Exam.
    • Model is too complex (e.g., a squiggly line that touches every single dot).
  3. Good Fit (The Student):

    • Understands the general concepts.
    • Does okay on Practice. Does okay on Real Exam.

Steve's Study Strategy

Q1Q2Q3Q4Q5● Practice questions
Practice Test
85%
Real Exam
80%
Passed!

How to fix Overfitting?

  1. More Data: Harder to memorize 1 million questions than 10.
  2. Regularization: Punish the model for being too complex. (Like fining the student for writing really long answers).
    • L1 (Lasso): Deletes useless features.
    • L2 (Ridge): Shrinks features.
  3. Early Stopping: Stop training before Steve starts memorizing.
  4. Dropout: (Used in Neural Nets) Randomly knock out neurons during training so they don't rely on each other too much.

The Golden Rule

NEVER TEST ON YOUR TRAINING DATA. Always split your data:

  • Train Set: For the model to learn.
  • Test Set: For the final exam.

If Train Score >> Test Score, you are Overfitting.

Summary

We want models that Generalize, not models that Memorize. A model that knows the existing data perfectly is usually useless for new data.

Next up: Teaching computers to see.

Share this article