How LLMs Work: The Full Pipeline

How Large Language Models Work

Every time you type a message to ChatGPT, Claude, or any AI chatbot, your text goes through a pipeline of four steps. This playground lets you see each step happen in real time.

Step 1: Tokenization, Breaking Text into Pieces

LLMs don't read words the way you do. They break text into tokens, chunks that might be whole words, parts of words, or even single characters.

"understanding" might become ["under", "standing"]. "ChatGPT" might become ["Chat", "G", "PT"]. Every model has its own tokenizer with its own vocabulary, which is why the same sentence can produce different numbers of tokens on different models.

Why it matters: The number of tokens determines how much the model can process at once (its "context window") and how much it costs to run.

Step 2: Embeddings, Words Become Numbers

Each token gets converted into a vector, a long list of numbers (768 numbers for the model in this demo). These numbers capture the token's meaning in a way the model can do math with.

The scatter plot shows these vectors compressed down to 2D. Notice how similar words end up near each other, while function words like "the" and "a" cluster separately from content words like "cat" and "dog."

Why it matters: This is how AI "understands" language. It doesn't know what words mean the way you do. It knows that "cat" and "dog" are closer to each other than "cat" and "the."

Step 3: Attention, Words Look at Each Other

This is where the magic happens. Each token attends to every other token in the sentence, computing how relevant each word is to every other word.

When the model processes "The cat sat on the mat because it was tired," the attention mechanism helps the model figure out that "it" refers to "the cat", not "the mat." It does this by learning patterns from billions of sentences.

The playground shows three different attention "heads." Each head learns to pay attention to different things:

Position heads care about nearby words
Syntax heads track grammatical structure
Semantics heads connect related meanings across the sentence

Real models have dozens of these heads running in parallel, each learning different patterns.

Why it matters: Attention is what makes transformers better than older approaches. Instead of processing words one at a time (like RNNs), transformers process all words simultaneously and let each word decide which other words are relevant.

Step 4: Prediction, What Comes Next?

After processing the full sentence through attention, the model produces a probability distribution over its entire vocabulary. Each word in the vocabulary gets a score representing how likely it is to come next.

The "Creativity" slider controls the temperature, a parameter that reshapes this distribution:

Low temperature (0.1): The model is very confident. One word dominates. Outputs are predictable and repetitive.
High temperature (2.0): The distribution flattens out. Unlikely words get a chance. Outputs are creative but sometimes nonsensical.

Why it matters: This is literally how AI generates text. It predicts one word at a time, appends it to the sentence, and repeats. The entire conversation you have with an AI chatbot is just this loop running thousands of times.

The Big Picture

Your text  →  Tokens  →  Vectors  →  Attention  →  Prediction
"The cat"     [The][cat]   [0.2,...]   cat↔the:0.9   "sat" (42%)

That's it. Four steps. Every LLM, GPT-4, Claude, Llama, Gemini, uses this same pipeline. They differ in size (more layers, more heads, bigger vocabulary) and training data, but the architecture is the same.

What to Try

Compare tokenizers: Toggle the comparison to see how different models split the same text
Switch attention heads: See how Position, Syntax, and Semantics heads focus on different word relationships
Play with temperature: Drag the slider to see how "creativity" changes the prediction
Try different sentences: Each example shows different attention patterns