Is It a Cat?

You now know how neural networks work (Part 2). You built your own engine (App 1) and your own GPT (App 2). Time to build something your friends can actually use.

We are building an image classifier. Give it a photo. It tells you what it is. Cat, dog, airplane, frog, whatever.

The Classifier

The Dataset: CIFAR-10

CIFAR-10 is a classic dataset: 60,000 tiny images (32x32 pixels) in 10 categories:

Class	What
0	Airplane
1	Automobile
2	Bird
3	Cat
4	Deer
5	Dog
6	Frog
7	Horse
8	Ship
9	Truck

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
 
# Data transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
 
# Download CIFAR-10
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
 
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)
testloader = DataLoader(testset, batch_size=64, shuffle=False)
 
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

The Model: A CNN

Remember ML 9, The Eyeball? CNNs slide a small window across the image, looking for patterns. Let's build one:

class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Convolutional layers (feature extraction)
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(0.25)
 
        # Fully connected layers (classification)
        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, 10)
 
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))   # 32x32 -> 16x16
        x = self.pool(F.relu(self.conv2(x)))   # 16x16 -> 8x8
        x = self.pool(F.relu(self.conv3(x)))   # 8x8 -> 4x4
        x = self.dropout(x)
        x = x.view(-1, 128 * 4 * 4)            # flatten
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x
 
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = ImageClassifier().to(device)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

About 600K parameters. Tiny by modern standards, but enough to learn cats from dogs.

Training

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
 
for epoch in range(15):
    model.train()
    running_loss = 0.0
 
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)
 
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
 
        running_loss += loss.item()
 
    # Test accuracy
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
 
    acc = 100 * correct / total
    print(f"Epoch {epoch+1}/15, Loss: {running_loss/len(trainloader):.3f}, Accuracy: {acc:.1f}%")

After 15 epochs, you should see ~75-80% accuracy. Not state-of-the-art, but your model can genuinely tell cats from trucks.

Using Your Model

from PIL import Image
 
def predict(image_path):
    img = Image.open(image_path).resize((32, 32))
    img_tensor = transform(img).unsqueeze(0).to(device)
 
    model.eval()
    with torch.no_grad():
        output = model(img_tensor)
        probabilities = F.softmax(output, dim=1)
        confidence, predicted = torch.max(probabilities, 1)
 
    print(f"Prediction: {classes[predicted.item()]}")
    print(f"Confidence: {confidence.item()*100:.1f}%")
 
predict("my_cat.jpg")
# Output: Prediction: cat, Confidence: 87.3%

Save and Load

# Save
torch.save(model.state_dict(), 'classifier.pth')
 
# Load (later)
model = ImageClassifier()
model.load_state_dict(torch.load('classifier.pth'))
model.eval()

What You Built

A real, working image classifier that:

Takes any 32x32 image as input
Runs it through 3 convolutional layers
Outputs one of 10 categories with a confidence score
Can be saved and loaded

This is the same fundamental approach used in medical imaging, self-driving cars, and face recognition. Just bigger models and bigger datasets.

Next up: we build a text generator, fine-tune a model to write in any style you want.