**The problem:** LLMs have a knowledge cutoff and can hallucinate. You can't fit your entire company's docs in the context window.
**RAG architecture:**
1. **Embed** all your documents into a vector database
2. **Query**: user's question → embed → find top-K similar document chunks
3. **Augment**: inject retrieved chunks into the prompt: 'Based on these docs: [...] answer: [question]'
4. **Generate**: LLM answers with grounded context
**Result:** The model answers from real documents, not hallucinated memory. Citations are possible.
**Key components:** vector DB (Pinecone, Weaviate, pgvector), embedding model, chunking strategy, reranker.