# Hybrid Search & Reranking: From Top-50 Recall to Top-5 Precision

> How production RAG combines dense and sparse search, fuses with RRF, and reranks — turning a wide candidate set into the few passages that actually answer.

Production retrieval rarely relies on vector search alone. The winning pattern is hybrid search — fuse dense (semantic) and sparse (keyword/BM25) results, usually with Reciprocal Rank Fusion — to get high recall, then rerank the wide candidate set with a cross-encoder down to the precise few passages you put in the prompt.

Pure vector search is where most RAG demos start and most RAG production systems get stuck. Vectors match on *meaning*, which is exactly what you want for "how do I cancel my plan?" → "subscription termination." But they're surprisingly bad at *exact* matches — an error code like `ERR_2043`, a product name, a function identifier — because nothing is semantically "close" to an opaque token; it has to match. Production retrieval fixes this with two moves: **hybrid search** for recall, and **reranking** for precision.

## Dense + sparse: two retrievers, different blind spots

- **Dense (vector) search** encodes meaning. It nails paraphrases, synonyms, and conceptual matches, and it's robust to wording. It misses exact strings and rare tokens.
- **Sparse (keyword / BM25) search** matches terms. It nails codes, IDs, names, and exact phrases, and it's transparent. It misses anything phrased differently from the document.

Real user queries contain both kinds of intent, often in the same sentence ("why does `ERR_2043` happen when I rotate credentials?"). **Hybrid search runs both retrievers and fuses the results**, so you don't have to choose which class of query to fail.

## Fusing with Reciprocal Rank Fusion

The catch with combining two retrievers is that their scores aren't comparable — a cosine similarity of 0.82 and a BM25 score of 14.7 live on different scales, and normalizing them is fiddly and brittle. **Reciprocal Rank Fusion (RRF)** sidesteps the whole problem by using *rank* instead of *score*:

```text
RRF(d) = Σ  1 / (k + rank_i(d))      # k ≈ 60, sum over each list i that contains d
```

A document that ranks high in either list gets a strong combined score; one that ranks high in *both* gets a stronger one. There's essentially one knob (`k`), the default works well, and you avoid score-normalization entirely. That robustness is why RRF is the common default — many vector databases, including [Qdrant](/tools/qdrant), support hybrid queries with fusion built in.

> [!NOTE]
> You can weight dense vs. sparse if your workload skews one way, but start with plain RRF. It's a strong baseline that needs no tuning.

## Retrieve wide, rerank narrow

Hybrid search gets the right passage *into* the candidate set (recall). It doesn't guarantee it's at the *top* (precision). That's the reranker's job.

A **reranker** is a cross-encoder: it reads the query and a candidate passage together and scores their relevance directly. That joint reading is far more accurate than comparing two independently-made vectors — but it's too slow to run over a whole corpus, so you only run it on the candidates the first stage already found. The pattern:

1. **Over-retrieve** a wide set (top-25–50) with hybrid search — optimize for recall here.
2. **Rerank** that set with a cross-encoder like [Cohere Rerank](/tools/cohere-rerank).
3. **Keep the top 3–5** — enough to answer, few enough to keep the prompt tight and the model grounded.

> [!TIP]
> The single most common mistake is reranking too few candidates. If you only retrieve 5 and rerank them, the reranker can only reorder 5 — it can't add the answer that retrieval missed. Retrieve wide first.

## Prove it pays for itself

Both hybrid search and reranking add latency and cost, so don't add them on faith — measure. On a labeled eval set, track **recall@k after fusion** (did hybrid search get the answer into the candidate set?) and **nDCG@k after reranking** (did reranking move it to the top?). The [Benchmark Rerankers](/commands/review/benchmark-rerankers) command runs exactly this comparison, and the [retrieval-engineer](/agents/data-ai/retrieval-engineer) agent owns tuning the whole retrieval stage against the numbers.

For where these stages sit in the full pipeline, see [How RAG Actually Works](/guides/concepts/how-rag-works).

---

_Source: https://agentscamp.com/guides/concepts/hybrid-search-reranking — Guide on AgentsCamp._