The Wisdom of the Crowd

In ML 4, we built a Decision Tree. It was nice, but sometimes it makes mistakes. It gets obsessed with details.

What if we built 100 different Trees? And we gave them slightly different data? And then we made them Vote?

That is a Random Forest.

The Mob

Democracy in Action

Imagine you want to guess how many jellybeans are in a jar.

Tree 1 (Bob) says: 500.
Tree 2 (Alice) says: 600.
Tree 3 (Charlie) says: 550.

The average (550) is usually closer to the truth than any single person.

Loan Application: The Forest Decides

Income: $50k

Credit: 600

💰

Income Expert

Waiting...

📊

Credit Expert

Waiting...

⚖️

Balanced Expert

Waiting...

Each tree focuses on different features. The majority wins!

In a Random Forest:

We create lots of weak trees.
Each tree sees a random subset of the data (Bagging).
Each tree votes.
Majority wins.

Why "Random"?

If all the trees were the same, they'd make the same mistakes. We force them to be different by:

Giving them random data rows.
Forcing them to look at random features (e.g., Tree A only looks at Color, Tree B only looks at Size).

This makes the forest diverse. Diversity = Strength.

The Code (Python)

It's just as easy as a single tree.

from sklearn.ensemble import RandomForestClassifier
 
# 1. The Forest
# n_estimators=100 means "Plant 100 trees"
model = RandomForestClassifier(n_estimators=100)
 
# 2. Train the mob
model.fit(X_train, y_train)
 
# 3. Vote
prediction = model.predict(X_new)

Why isn't everyone using this?

Random Forests are amazing. They are the "Leatherman" tool of Machine Learning.

They don't overfit easily.
They handle messy data well.
They run fast.

But... they are hard to explain. "Why did the model reject my loan?" "Well, 64 trees said Yes, but 36 said No, and Tree #42 really didn't like your zip code."

Summary

A Random Forest is a bunch of Decision Trees held together by duct tape and democracy. It is one of the most powerful algorithms for structured data (Excel sheets).

Next up: What if we don't know the answers (labels)? The lonely world of Unsupervised Learning.