Chonkie
A lightweight, fast chunking library for RAG with many splitting strategies in one API.
Chonkie is a lightweight open-source library that turns documents into retrieval-ready chunks, with token, sentence, recursive, semantic, and code-aware chunkers behind one small API. Chunking quality sets the ceiling on RAG quality, and Chonkie makes good strategies easy to swap.
Chonkie is a lightweight, no-nonsense chunking library for RAG. Chunking — splitting documents into the passages you embed and retrieve — is the step that quietly sets the ceiling on retrieval quality, and Chonkie packages the strategies that matter behind one small, fast API so you can swap approaches without rewriting your pipeline.
It is aimed at engineers building retrieval pipelines who want sensible chunking without hand-rolling splitters or pulling in a heavy framework. Chonkie is small, has minimal dependencies, and is designed to be fast on large corpora.
Highlights
- Many chunkers, one API — token, sentence, recursive, semantic, and code-aware splitting, swappable with a one-line change.
- Semantic chunking — group sentences by embedding similarity so chunks align with meaning, not just length.
- Overlap and size control — tune chunk size and overlap to match your embedding model's context and your retrieval granularity.
- Lightweight & fast — minimal dependencies and a small footprint, suitable for batch-processing large document sets.
In an AI-assisted workflow
Chunk at ingestion, then embed and store the chunks:
from chonkie import RecursiveChunker
chunker = RecursiveChunker(chunk_size=512)
chunks = chunker(document_text)
# embed each chunk and upsert into your vector DB (e.g. Qdrant)TIP
There is no universal best chunk size — it depends on your documents and embedding model. Try a few strategies and measure retrieval quality; the Chunking Strategy Optimizer skill automates that sweep.
Good to know
Chonkie is free and open source (MIT). It handles the chunking stage only — you bring your own embedding model and vector database for the rest of the pipeline (see How RAG Actually Works).
Related
- Chunking Strategy OptimizerFind the chunking strategy and size that maximizes retrieval quality for a specific corpus, by sweeping configurations against a fixed eval set instead of guessing. Use when RAG answers miss obvious content, when standing up a new corpus, or when picking chunk size/overlap.
- How RAG Actually Works: Ingestion, Chunking, Retrieval & RerankingA clear, practical walkthrough of the retrieval-augmented generation pipeline — what each stage does, where it fails, and how the pieces fit together.
- Rag Pipeline EngineerUse this agent to design, build, and harden a production retrieval-augmented generation (RAG) pipeline end to end — ingestion, chunking, embeddings, indexing, retrieval, reranking, and grounded generation — with evals that prove each stage works. Examples — "stand up RAG over our docs", "our RAG hallucinates and misses obvious answers, fix the pipeline", "take our prototype RAG to production with evals and citations".
- QdrantAn open-source vector database written in Rust, built for low-latency similarity search at scale.