OpenAI's 2020 scaling laws paper showed that model performance improves predictably with scale — more data, more compute, more parameters. This insight drove the race to build ever-larger models.

**The Chinchilla insight (2022):** Hoffman et al. showed most models were under-trained relative to their size. The optimal ratio is ~20 tokens of training data per model parameter.

**What scales with power laws:**

- Loss decreases predictably as parameters increase (for fixed compute)

- Same compute → better to have more parameters OR more data (not just one)

- Performance on downstream tasks follows similar trends

**The implication:** You can *predict* how a model will perform before training it, if you know the compute budget. This made AI research more systematic.

**Caveats:** Scaling laws have limits. Beyond a certain point, emergent abilities appear unpredictably. Some tasks don't improve with scale.

Scaling Laws: Why Bigger Models Are Better (Usually)