Back to Blog
Chief Idiot2 min read

ML 12: The Dog Trainer (Reinforcement Learning)

Training AI with treats and newspapers.

Pavlov's AI

Supervised Learning: "Here is input, here is answer." Unsupervised Learning: "Here is input, good luck."

Reinforcement Learning (RL): "Here is an environment. Do whatever. If you do good, you get a cookie (+1). If you die, you get a slap (-1)."

The Agent and The Environment

  • Agent: The Gamer (AI).
  • Environment: The Game (Super Mario).
  • Action: Jump, Run, Duck.
  • State: Where Mario is, where the Goomba is.
  • Reward: Coins (+), Winning (+), Dying (-).

The Dog Trainer

Exploration vs Exploitation

The Agent has a dilemma:

  • Exploit: Do what I know gives points (Jump on Goomba).
  • Explore: Try something new (Jump down that weird pipe). Maybe it's death. Maybe it's a secret level with 1000 coins.

If you never explore, you never find the optimal path. If you never exploit, you die randomly.

Q-Learning Robot

Green = good path | Red = danger zone

Episode: 0Score: 0
Play manually or train the AI!
🤖
🕳️
🕳️
🏆

Q-Learning and Deep Q-Networks (DQN)

The AI builds a "Cheat Sheet" (Q-Table) of (State, Action) -> Expected Reward.

  • "If I see a pit and I Jump -> +10 survival."
  • "If I see a pit and I Run -> -100 death."

DeepMind used this to play Atari games. AlphaGo used this to beat the world champion at Go.

Summary

In RL, we don't teach the AI how to win. We just give it the goal. It figures out crazy strategies we never thought of (like glitching the game or playing weird moves).

Next up: The Artist using AI against itself.

Share this article