App 3: The Classifier (Is It a Cat?)
Build an image classifier in PyTorch. Upload a photo, get a prediction. Your first real AI app.
Is It a Cat?
You now know how neural networks work (Part 2). You built your own engine (App 1) and your own GPT (App 2). Time to build something your friends can actually use.
We are building an image classifier. Give it a photo. It tells you what it is. Cat, dog, airplane, frog — whatever.

The Dataset: CIFAR-10
CIFAR-10 is a classic dataset: 60,000 tiny images (32x32 pixels) in 10 categories:
| Class | What |
|---|---|
| 0 | Airplane |
| 1 | Automobile |
| 2 | Bird |
| 3 | Cat |
| 4 | Deer |
| 5 | Dog |
| 6 | Frog |
| 7 | Horse |
| 8 | Ship |
| 9 | Truck |
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Data transforms
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# Download CIFAR-10
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)
testloader = DataLoader(testset, batch_size=64, shuffle=False)
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')The Model: A CNN
Remember ML 9 — The Eyeball? CNNs slide a small window across the image, looking for patterns. Let's build one:
class ImageClassifier(nn.Module):
def __init__(self):
super().__init__()
# Convolutional layers (feature extraction)
self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.dropout = nn.Dropout(0.25)
# Fully connected layers (classification)
self.fc1 = nn.Linear(128 * 4 * 4, 256)
self.fc2 = nn.Linear(256, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # 32x32 -> 16x16
x = self.pool(F.relu(self.conv2(x))) # 16x16 -> 8x8
x = self.pool(F.relu(self.conv3(x))) # 8x8 -> 4x4
x = self.dropout(x)
x = x.view(-1, 128 * 4 * 4) # flatten
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = ImageClassifier().to(device)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")About 600K parameters. Tiny by modern standards, but enough to learn cats from dogs.
Training
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
for epoch in range(15):
model.train()
running_loss = 0.0
for images, labels in trainloader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
# Test accuracy
model.eval()
correct, total = 0, 0
with torch.no_grad():
for images, labels in testloader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc = 100 * correct / total
print(f"Epoch {epoch+1}/15, Loss: {running_loss/len(trainloader):.3f}, Accuracy: {acc:.1f}%")After 15 epochs, you should see ~75-80% accuracy. Not state-of-the-art, but your model can genuinely tell cats from trucks.
Using Your Model
from PIL import Image
def predict(image_path):
img = Image.open(image_path).resize((32, 32))
img_tensor = transform(img).unsqueeze(0).to(device)
model.eval()
with torch.no_grad():
output = model(img_tensor)
probabilities = F.softmax(output, dim=1)
confidence, predicted = torch.max(probabilities, 1)
print(f"Prediction: {classes[predicted.item()]}")
print(f"Confidence: {confidence.item()*100:.1f}%")
predict("my_cat.jpg")
# Output: Prediction: cat, Confidence: 87.3%Save and Load
# Save
torch.save(model.state_dict(), 'classifier.pth')
# Load (later)
model = ImageClassifier()
model.load_state_dict(torch.load('classifier.pth'))
model.eval()What You Built
A real, working image classifier that:
- Takes any 32x32 image as input
- Runs it through 3 convolutional layers
- Outputs one of 10 categories with a confidence score
- Can be saved and loaded
This is the same fundamental approach used in medical imaging, self-driving cars, and face recognition. Just bigger models and bigger datasets.
Next up: we build a text generator — fine-tune a model to write in any style you want.