# RAG (Retrieval-Augmented Generation)

> RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.

**RAG (retrieval-augmented generation) is the technique of fetching relevant documents from your own data and inserting them into a language model's prompt at query time, so the model answers from retrieved facts instead of training-data memory alone.**

The pipeline has two halves. Offline, your documents are split into chunks, converted to [embeddings](/glossary/embedding), and stored in a [vector database](/glossary/vector-database). Online, the user's question is embedded the same way, the most similar chunks are retrieved (often refined by [reranking](/glossary/reranking)), and those chunks are placed in the prompt alongside the question. The model then generates an answer grounded in what was retrieved.

RAG became the default architecture for "chat with your data" because it solves the two things models can't do alone: know **private** information and know **current** information — without the cost of retraining. Its quality ceiling is retrieval quality: if the right chunk isn't fetched, the best model still answers wrong, which is why most RAG engineering effort goes into chunking, search, and reranking rather than the model call.

For the full pipeline, stage by stage, see [How RAG Actually Works](/guides/concepts/how-rag-works).

---

_Source: https://agentscamp.com/glossary/rag — Term on AgentsCamp._