ML 1: Drawing Lines (Linear Regression)
The simplest machine learning algorithm that basically runs the entire world economy.
Predicting the Future with a Stick
Imagine you are an idiot (easy, I know). You start noticing a pattern in your life:
"The sadder I am, the more pizza I eat."
You want to make a Prediction Machine. You want to be able to say: "I am level 8 sad today. How many pizzas should I buy?"
The Data
You track your life for 10 weeks:
| Sadness Level (0-10) | Pizzas Eaten |
|---|---|
| 1 | 1 |
| 2 | 2.5 |
| 3 | 2 |
| 4 | 4 |
| 5 | 3.5 |
| 6 | 5 |
| 7 | 5.5 |
| 8 | 7 |
| 9 | 8.5 |
| 10 | 9 |
If you plot these dots, they mostly go UP and to the RIGHT. To make a prediction, you just need to draw a line through the middle of them.
The Sadness Data
That's it. That's Linear Regression. Drawing a line. A 5-year-old can do it.
You vs. The Machine
In math class (which you slept through), a line is defined by two things:
- Slope (): How steep it is.
- Intercept (): Where it starts.
The formula is:
The goal of Machine Learning is just to find the perfect and so the line hits close to the dots.
Try It Yourself
I want you to be the computer. Below is your actual Pizza Data. Use the sliders to find the best line.
Try to get the Error Score as low as possible (ideally under 0.50).
Interactive Model Trainer
What Did You Just Do?
When you moved the slider and saw the error go up, you moved it back, right? "Oops, too high. Let me go lower. Ah, better."
Congratulations. You just performed Gradient Descent.
The computer does the exact same thing. It tries a random line, measures the error, and then nudges the line a tiny bit to make the error smaller. It does this 1,000 times a second until the error is zero.
It's not "Artificial Intelligence." It's just trial and error at light speed.
How to Do This in Python
Okay, moving sliders is fun, but how do we make the computer do it? We use a library called scikit-learn (Scientific Kit for Learning).
import numpy as np
from sklearn.linear_model import LinearRegression
# 1. The Data (Sadness level)
# We reshape it because the computer expects a list of lists [[1], [2], ...]
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
# 2. The Answers (Pizzas eaten)
y = np.array([1, 2.5, 2, 4, 3.5, 5, 5.5, 7, 8.5, 9])
# 3. Create the Model (The empty brain)
model = LinearRegression()
# 4. Train the Model (Gradient Descent happens here!)
model.fit(X, y)
# 5. Make a Prediction
sadness_today = 8
prediction = model.predict([[sadness_today]])
print(f"If you are {sadness_today}/10 sad, you should eat {prediction[0]:.2f} pizzas.")
# Output: If you are 8/10 sad, you should eat 7.23 pizzas.Copy-paste that into your messy Python script. You are now a Data Scientist.