Reranking
Reranking is a second-pass scoring step: a cross-encoder model re-orders the top results from fast retrieval so the truly relevant few rise to the top.
Reranking is the precision stage of retrieval: after a fast first pass fetches candidate documents, a reranker model scores each candidate against the query directly and re-orders them, so the few results that actually matter end up on top.
The two stages exist because of an accuracy/speed trade. First-pass retrieval (semantic or keyword) uses representations computed independently — fast enough for millions of documents, but blind to fine query–document interaction. A reranker is a cross-encoder: it reads the query and candidate together, which is dramatically more accurate and dramatically slower — viable only on a short list. The standard RAG pattern: retrieve top-50 cheaply, rerank to top-5 precisely, and put just those in the prompt — better answers and fewer tokens.
Hosted rerankers (Cohere Rerank, Voyage) make the step one API call. Whether it pays in your pipeline is an empirical question — Hybrid Search & Reranking covers the architecture, and the benchmark-rerankers command measures the lift on your own queries.
Frequently asked questions
- Why rerank if retrieval already ranks by similarity?
- Because first-pass retrieval optimizes for speed over millions of documents, scoring query and document separately (a bi-encoder). A reranker is a cross-encoder: it reads the query and each candidate together, capturing interactions the fast pass can't — much more accurate, far too slow to run on everything. So you retrieve 50 fast, rerank to a precise top 5.
- When is a reranker worth adding?
- When your retriever's top-50 usually contains the right answer but the top-5 often doesn't — reranking converts recall you already have into precision. If the right document isn't in the candidate pool at all, fix retrieval first; a reranker can't promote what was never fetched. Measure before and after — that's what our benchmark-rerankers command automates.
Related
- Hybrid Search & Reranking: From Top-50 Recall to Top-5 PrecisionHow production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.
- Semantic SearchSemantic search retrieves results by meaning rather than keyword overlap — embedding queries and documents in one vector space and matching by similarity.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Cohere RerankA hosted reranking API that reorders retrieved passages by true relevance to a query.
- Benchmark RerankersMeasure whether adding a reranker actually improves retrieval, by scoring reranked vs. un-reranked results on a labeled query set.
- Voyage AIEmbedding and reranking models tuned for retrieval, now part of MongoDB.
- Cosine SimilarityCosine similarity measures how alike two embeddings are by the angle between them — the standard relevance score behind semantic search and RAG retrieval.
- Hybrid SearchHybrid search runs keyword (BM25) and semantic (vector) retrieval together and merges the results — catching both exact terms and paraphrases.