How LLMs Work: The Full Pipeline
Type a sentence and watch it flow through an LLM — tokenization, embeddings, attention, and prediction — step by step.
Thinking... this might take a while, we're idiots
How Large Language Models Work
Every time you type a message to ChatGPT, Claude, or any AI chatbot, your text goes through a pipeline of four steps. This playground lets you see each step happen in real time.
Step 1: Tokenization — Breaking Text into Pieces
LLMs don't read words the way you do. They break text into tokens — chunks that might be whole words, parts of words, or even single characters.
"understanding" might become ["under", "standing"]. "ChatGPT" might become ["Chat", "G", "PT"]. Every model has its own tokenizer with its own vocabulary, which is why the same sentence can produce different numbers of tokens on different models.
Why it matters: The number of tokens determines how much the model can process at once (its "context window") and how much it costs to run.
Step 2: Embeddings — Words Become Numbers
Each token gets converted into a vector — a long list of numbers (768 numbers for the model in this demo). These numbers capture the token's meaning in a way the model can do math with.
The scatter plot shows these vectors compressed down to 2D. Notice how similar words end up near each other, while function words like "the" and "a" cluster separately from content words like "cat" and "dog."
Why it matters: This is how AI "understands" language. It doesn't know what words mean the way you do. It knows that "cat" and "dog" are closer to each other than "cat" and "the."
Step 3: Attention — Words Look at Each Other
This is where the magic happens. Each token attends to every other token in the sentence, computing how relevant each word is to every other word.
When the model processes "The cat sat on the mat because it was tired," the attention mechanism helps the model figure out that "it" refers to "the cat" — not "the mat." It does this by learning patterns from billions of sentences.
The playground shows three different attention "heads." Each head learns to pay attention to different things:
- Position heads care about nearby words
- Syntax heads track grammatical structure
- Semantics heads connect related meanings across the sentence
Real models have dozens of these heads running in parallel, each learning different patterns.
Why it matters: Attention is what makes transformers better than older approaches. Instead of processing words one at a time (like RNNs), transformers process all words simultaneously and let each word decide which other words are relevant.
Step 4: Prediction — What Comes Next?
After processing the full sentence through attention, the model produces a probability distribution over its entire vocabulary. Each word in the vocabulary gets a score representing how likely it is to come next.
The "Creativity" slider controls the temperature — a parameter that reshapes this distribution:
- Low temperature (0.1): The model is very confident. One word dominates. Outputs are predictable and repetitive.
- High temperature (2.0): The distribution flattens out. Unlikely words get a chance. Outputs are creative but sometimes nonsensical.
Why it matters: This is literally how AI generates text. It predicts one word at a time, appends it to the sentence, and repeats. The entire conversation you have with an AI chatbot is just this loop running thousands of times.
The Big Picture
Your text → Tokens → Vectors → Attention → Prediction
"The cat" [The][cat] [0.2,...] cat↔the:0.9 "sat" (42%)
That's it. Four steps. Every LLM — GPT-4, Claude, Llama, Gemini — uses this same pipeline. They differ in size (more layers, more heads, bigger vocabulary) and training data, but the architecture is the same.
What to Try
- Compare tokenizers: Toggle the comparison to see how different models split the same text
- Switch attention heads: See how Position, Syntax, and Semantics heads focus on different word relationships
- Play with temperature: Drag the slider to see how "creativity" changes the prediction
- Try different sentences: Each example shows different attention patterns
Join the Idiots
New lab every Sunday. No spam, unsubscribe anytime.