Language models don't 'think' and then write. They predict the next token, over and over, using probability distributions shaped by billions of training examples.

LLMs are next-token predictors.

Given the sequence 'The capital of France is', the model outputs a probability distribution over all tokens in its vocabulary. 'Paris' gets high probability. That token is sampled and appended. Repeat.

**Temperature** controls randomness:

- Low (0.1) → deterministic, repetitive

- High (1.5) → creative but incoherent

**Context window** is how many tokens the model 'sees' at once. GPT-4 can see ~128k tokens — about 100,000 words.

This simple mechanism, scaled to hundreds of billions of parameters, produces remarkably coherent and useful text.

How LLMs Generate Text: Token by Token