Chunking
Chunking splits documents into retrievable pieces before embedding — the RAG design decision that quietly determines retrieval quality.
Chunking is splitting documents into pieces — the units that get embedded, indexed, and retrieved — and it's the most underestimated decision in any RAG pipeline.
The constraint is structural: retrieval returns chunks, so each one must stand alone as evidence. Split mid-thought and the answer exists in your corpus but in no retrievable unit (the classic silent failure on the debugging checklist); merge too much and the chunk's embedding averages across topics, matching everything weakly. The craft balances coherence (complete semantic units), context (overlap so boundary-spanning facts survive), and granularity (focused enough to embed sharply).
The strategy ladder: fixed-size (baseline, structure-blind), recursive/structure-aware (split on headings → paragraphs → sentences — the sane default), semantic (boundary detection by embedding shift — expensive, occasionally worth it), and document-aware (tables, code blocks kept intact — where parsers and libraries like Chonkie earn their keep). Whatever the choice, treat it empirically: chunking is a parameter your retrieval evals tune, which is exactly the experiment the chunking-strategy-optimizer skill runs. The full pipeline context lives in How RAG Actually Works.
Frequently asked questions
- What chunk size should I use?
- Start around 300–800 tokens with 10–15% overlap and measure — but the real answer is shape, not size: chunks should be self-contained units of meaning (a section, a complete thought), which is why structure-aware splitting (headings, paragraphs) beats fixed-size slicing on most corpora. The right size is whatever your retrieval evals say it is.
- Why does chunking matter so much?
- Because chunks are what gets embedded and retrieved — a fact severed mid-thought across two chunks is invisible to retrieval, and a chunk stuffed with three topics embeds as mush. Bad chunking caps the whole pipeline: no embedding model or reranker can recover what the splitter destroyed.
Related
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- EmbeddingAn embedding is a vector of numbers representing text's meaning, placed so similar texts land close together — the foundation of semantic search and RAG.
- How RAG Actually Works: Ingestion, Chunking, Retrieval & RerankingA clear, practical walkthrough of the retrieval-augmented generation pipeline — what each stage does, where it fails, and how the pieces fit together.
- Chunking Strategy OptimizerFind the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.
- ChonkieA lightweight, fast chunking library for RAG with many splitting strategies in one API.
- Why RAG Fails: A Debugging ChecklistA diagnostic checklist for broken RAG — localize the failure to ingestion, retrieval, ranking, or generation, and apply the fix that matches, in order.