Adding 'Let's think step by step' to a math problem improved GPT-3's accuracy from 18% to 79%. One phrase. Same model. This is the power of chain-of-thought prompting.

Language models generate answers token by token, left to right. For simple questions, this works fine. For complex reasoning — math, logic, multi-step problems — the answer often requires intermediate steps that don't fit in the 'next token' probability distribution naturally.

Chain-of-thought (CoT) prompting forces the model to generate those intermediate steps explicitly. By 'thinking out loud,' the model can leverage each reasoning step as context for the next, building up correct solutions that would fail with direct prediction.

**The landmark finding (Wei et al., 2022):**

On math word problems, adding 'Let's think step by step' to prompts improved GPT-3's accuracy from 18% to 79%. The model didn't change — just the prompting strategy.

**Why it works:**

At each token, the model only 'sees' the tokens before it. If it jumps straight to an answer, it has to compress all reasoning into the final output. If it reasons step by step, each step is in the context window and informs the next — far more reliable for multi-step problems.

**Variants:**

- **Zero-shot CoT**: 'Let's think step by step' (no examples needed)

- **Few-shot CoT**: Provide 3-5 worked examples of step-by-step reasoning

- **Self-consistency**: Generate multiple reasoning chains, take majority vote

- **Tree of Thought (ToT)**: Explore multiple reasoning branches, evaluate, backtrack

- **Extended thinking** (Claude): Model allocates tokens to 'think' before responding

**Limits**: CoT helps reasoning tasks, not factual recall. It can create elaborate but wrong reasoning chains — confident, structured hallucinations.

**Key takeaway:** Chain-of-thought prompting makes AI show its work. One phrase can boost reasoning accuracy by 4x — same model, different prompting.

Chain-of-Thought: Teaching AI to Show Its Work