Back to Blog
Chief Idiot4 min read

App 3: The Classifier (Is It a Cat?)

Build an image classifier in PyTorch. Upload a photo, get a prediction. Your first real AI app.

Is It a Cat?

You now know how neural networks work (Part 2). You built your own engine (App 1) and your own GPT (App 2). Time to build something your friends can actually use.

We are building an image classifier. Give it a photo. It tells you what it is. Cat, dog, airplane, frog — whatever.

The Classifier

The Dataset: CIFAR-10

CIFAR-10 is a classic dataset: 60,000 tiny images (32x32 pixels) in 10 categories:

ClassWhat
0Airplane
1Automobile
2Bird
3Cat
4Deer
5Dog
6Frog
7Horse
8Ship
9Truck
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
 
# Data transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
 
# Download CIFAR-10
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
 
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)
testloader = DataLoader(testset, batch_size=64, shuffle=False)
 
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

The Model: A CNN

Remember ML 9 — The Eyeball? CNNs slide a small window across the image, looking for patterns. Let's build one:

class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Convolutional layers (feature extraction)
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(0.25)
 
        # Fully connected layers (classification)
        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, 10)
 
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))   # 32x32 -> 16x16
        x = self.pool(F.relu(self.conv2(x)))   # 16x16 -> 8x8
        x = self.pool(F.relu(self.conv3(x)))   # 8x8 -> 4x4
        x = self.dropout(x)
        x = x.view(-1, 128 * 4 * 4)            # flatten
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x
 
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = ImageClassifier().to(device)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

About 600K parameters. Tiny by modern standards, but enough to learn cats from dogs.

Training

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
 
for epoch in range(15):
    model.train()
    running_loss = 0.0
 
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)
 
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
 
        running_loss += loss.item()
 
    # Test accuracy
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
 
    acc = 100 * correct / total
    print(f"Epoch {epoch+1}/15, Loss: {running_loss/len(trainloader):.3f}, Accuracy: {acc:.1f}%")

After 15 epochs, you should see ~75-80% accuracy. Not state-of-the-art, but your model can genuinely tell cats from trucks.

Using Your Model

from PIL import Image
 
def predict(image_path):
    img = Image.open(image_path).resize((32, 32))
    img_tensor = transform(img).unsqueeze(0).to(device)
 
    model.eval()
    with torch.no_grad():
        output = model(img_tensor)
        probabilities = F.softmax(output, dim=1)
        confidence, predicted = torch.max(probabilities, 1)
 
    print(f"Prediction: {classes[predicted.item()]}")
    print(f"Confidence: {confidence.item()*100:.1f}%")
 
predict("my_cat.jpg")
# Output: Prediction: cat, Confidence: 87.3%

Save and Load

# Save
torch.save(model.state_dict(), 'classifier.pth')
 
# Load (later)
model = ImageClassifier()
model.load_state_dict(torch.load('classifier.pth'))
model.eval()

What You Built

A real, working image classifier that:

  • Takes any 32x32 image as input
  • Runs it through 3 convolutional layers
  • Outputs one of 10 categories with a confidence score
  • Can be saved and loaded

This is the same fundamental approach used in medical imaging, self-driving cars, and face recognition. Just bigger models and bigger datasets.

Next up: we build a text generator — fine-tune a model to write in any style you want.

Share this article