WeeBytes
Start for free
Context Windows: AI's Working Memory
BeginnerAI BasicsLLM FundamentalsKnowledge

Context Windows: AI's Working Memory

Why does ChatGPT forget what you said 50 messages ago? Because AI models have a limited 'working memory' called the context window — and what happens at the edges is fascinating.

Every LLM has a context window — the maximum amount of text it can 'see' at once when generating a response. It's like short-term memory: everything in the window is available, everything outside it is gone.

**Sizes over time:**

- GPT-3 (2020): 4,096 tokens (~3,000 words)

- GPT-4 (2023): 128,000 tokens (~100,000 words)

- Claude 3.5 Sonnet: 200,000 tokens (~150,000 words)

- Gemini 1.5 Pro: 1,000,000 tokens (~750,000 words)

1 token ≈ 0.75 words. A typical novel is about 100,000 words — so Gemini 1.5 Pro can fit 7+ novels in its context window simultaneously.

**The Lost in the Middle problem**: Research shows that LLMs are best at recalling information from the very beginning and very end of their context. Information in the middle gets 'lost.' For very long contexts, retrieval accuracy drops significantly for middle-of-document content.

**Computational cost**: Attention is O(n²) in the context length — doubling the context quadruples the computation. This is why larger context windows are expensive, and why there's active research into more efficient attention mechanisms (Flash Attention, Sliding Window Attention, etc.).

**Practical tip**: For long documents, put the most important information either at the start or end of your prompt. Don't bury key instructions in the middle.

**Key takeaway:** Context window = AI's working memory. Bigger = better, but expensive. Key info should go at the start or end.

context-windowllm-memorytokensattentionsupervised-learning-in-60-seconds

Want more like this?

WeeBytes delivers 25 cards like this every day — personalised to your interests.

Start learning for free