# Needle in a Haystack

> Needle in a haystack is a long-context eval that hides a fact in filler text and tests whether the model can retrieve it at varying depths and lengths.

**Needle in a haystack is a long-[context window](/glossary/context-window) evaluation that embeds a specific fact (the "needle") inside a large body of unrelated filler text (the "haystack") and tests whether the model can retrieve it.**

The test systematically varies two dimensions: the *depth* at which the needle sits within the input and the total *length* of the context. Running every combination produces a grid that reveals not just whether a model can use its full window, but where retrieval degrades — most famously the "lost in the middle" weakness, where facts placed mid-context are recalled far less reliably than those near the start or end.

It matters because a model advertising a huge context window may not use all of it equally well, and this eval turns that claim into a measurable [eval dataset](/glossary/eval-dataset) rather than a marketing number. The caveat is that finding a single planted fact is an easy task — it tests recall of an exact string, not reasoning across scattered evidence or synthesizing many passages, so strong needle scores don't guarantee strong real-world long-context performance. For how this informs the retrieval-versus-long-context choice, see [RAG vs long context](/guides/concepts/rag-vs-long-context).

---

_Source: https://agentscamp.com/glossary/needle-in-a-haystack — Term on AgentsCamp._