App 4: The Writer (Text Generator)
Fine-tune a language model to write in any style. Poems, code, tweets — you pick the voice.
Teaching AI Your Voice
In App 2 we built a GPT from scratch and it learned to sound like Shakespeare. But what if you want it to sound like you? Or like a pirate? Or like a recipe book?
That is fine-tuning — taking a pre-trained model and teaching it your style. Today, we use HuggingFace Transformers and PyTorch to do exactly that.

The Idea
Training a language model from scratch takes millions of dollars. Fine-tuning one takes a laptop and 20 minutes.
- Start with a pre-trained model (it already knows English)
- Feed it your text (poems, tweets, recipes, whatever)
- It adapts its style to match yours
Setup
# Install what we need
# pip install transformers datasets torch
from transformers import (
GPT2LMHeadModel,
GPT2Tokenizer,
TextDataset,
DataCollatorForLanguageModeling,
Trainer,
TrainingArguments,
)We are using GPT-2 Small (124M parameters). It is free, open-source, and fits on any laptop.
Step 1: Prepare Your Data
Create a text file with the style you want. The more text, the better. Here is an example — pirate speak:
# Create a training file
training_text = """
Ahoy! The sea be rough today, and me crew be lazier than a barnacle on a rock.
We sailed three leagues before the wind turned foul. The captain cursed the sky.
Every morning I wake to the sound of waves and the smell of salt and bad decisions.
The treasure map be nothing but lies, but we follow it anyway. What else is there?
A pirate without a ship is just a man with bad hygiene and questionable life choices.
The parrot said nothing useful today. As usual. I am starting to doubt its intelligence.
"""
# Save to file (in practice, use a larger dataset)
with open("pirate_text.txt", "w") as f:
for _ in range(100): # repeat to give the model more to learn from
f.write(training_text)In real use, you would collect much more text — blog posts, books, chat logs, etc.
Step 2: Load the Model
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
# Prepare the dataset
dataset = TextDataset(
tokenizer=tokenizer,
file_path="pirate_text.txt",
block_size=128,
)
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False, # GPT-2 is causal, not masked
)Step 3: Fine-Tune
training_args = TrainingArguments(
output_dir="./pirate-gpt",
overwrite_output_dir=True,
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=500,
save_total_limit=2,
logging_steps=100,
learning_rate=5e-5,
warmup_steps=100,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
)
trainer.train()On a laptop CPU this takes 10-20 minutes. On a GPU, a couple minutes.
Step 4: Generate Text
def generate(prompt, max_length=150):
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=max_length,
num_return_sequences=1,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generate("The captain looked at the horizon and"))The output will have a pirate flavor — sea metaphors, rough language, and questionable decisions.
Temperature: The Creativity Knob
| Temperature | Behavior |
|---|---|
| 0.1 | Very safe, repetitive, boring |
| 0.5 | Balanced, coherent |
| 0.8 | Creative, interesting |
| 1.2 | Wild, sometimes nonsense |
| 2.0 | Completely unhinged |
# Conservative
print(generate("The sea", max_length=50)) # temperature=0.3
# Creative
print(generate("The sea", max_length=50)) # temperature=1.0Save Your Model
model.save_pretrained("./pirate-gpt")
tokenizer.save_pretrained("./pirate-gpt")
# Load later
model = GPT2LMHeadModel.from_pretrained("./pirate-gpt")
tokenizer = GPT2Tokenizer.from_pretrained("./pirate-gpt")Ideas for Your Own Writer
| Style | Training Data |
|---|---|
| Your own writing voice | Your blog posts, emails, journal |
| Recipe generator | Cooking websites, recipe books |
| Poet | Poetry collections (public domain) |
| Code commenter | GitHub commit messages |
| DnD narrator | Game transcripts and fantasy novels |
What You Built
A text generator that:
- Starts from GPT-2 (which already knows English)
- Learns your specific writing style from examples
- Generates new text that sounds like your training data
- Has a temperature knob for creativity control
This is the same process used to create specialized AI assistants, creative writing tools, and domain-specific chatbots. The only difference is scale.
Next up: we put everything on the internet.