Back to Blog
By Sumit Pandey, PhD2 min read

ML 9: The Eyeball (CNNs)

How do computers actually see images? CNNs explained in plain English with an interactive demo where you can pass an image through every layer yourself.

Seeing the World

Normal Neural Networks (MLPs) struggle with images. An image is just a grid of pixels. A 1000x1000 image has 1,000,000 inputs. If you feed all 1 Million pixels into a Dense Neural Network, it explodes. It's too much.

Enter the Convolutional Neural Network (CNN).

The Eye

The Scanner

Instead of looking at the whole image at once, a CNN looks at small chunks. Imagine looking at a picture through a paper towel roll. You scan across.

  1. Filters (Kernels): Small 3x3 grids that look for specific things.

    • One filter looks for Vertical Lines.
    • One filter looks for Horizontal Lines.
    • One filter looks for Circles.
  2. Pooling: Shrinking the image. "Okay, this area is generally dark."

  3. Layers:

    • Layer 1 sees Lines.
    • Layer 2 combines lines to see Shapes (Eyes, Ears).
    • Layer 3 combines shapes to see Objects (Cat Face).

How a CNN Sees

Watch filters transform real images

Original
Filter
-1
-1
-1
-1
8
-1
-1
-1
-1
Finds edges
Filtered

CNNs use many filters like these to detect features (edges, textures, shapes)

Feature Maps

A CNN doesn't "see" a cat. It sees:

  • A map of where the fluffy texture is.
  • A map of where the pointy ears are.
  • A map of where the whiskers are.

If all those maps light up, it guesses "CAT".

The Code (Keras/TensorFlow)

from tensorflow.keras import layers, models
 
model = models.Sequential()
 
# 1. The Scanner (Conv2D)
# 32 filters, 3x3 size.
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
 
# 2. The Shrinker (MaxPooling)
model.add(layers.MaxPooling2D((2, 2)))
 
# 3. Another Scanner
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
 
# 4. Flatten and Decide (Standard Neural Net at the end)
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))

Summary

CNNs revolutionized AI. Before them, Computer Vision was garbage. Now, your phone can unlock with your face, and your car can see stop signs (mostly).

Next up: What if the data is a sequence, like a sentence?

Share this article