# Top-k Sampling

> Top-k sampling restricts an LLM's next-token choice to the k most probable tokens before sampling; lower k is more deterministic, higher k more diverse.

**Top-k sampling is a decoding setting that limits the model's next-token choice to the k most probable candidates, then samples from that truncated set — so improbable tokens are excluded before any randomness is applied.**

At each step the model produces a probability over its whole vocabulary. Top-k keeps only the k highest-ranked tokens and renormalizes, discarding the long tail. A small k (say 5) makes generation safer and more deterministic by ruling out unlikely words; a large k admits more variety and surprise. It's one of the standard knobs alongside [temperature](/glossary/temperature), which reshapes the probabilities, and [top-p](/glossary/top-p) (nucleus sampling), which keeps a variable-size set instead of a fixed count.

In practice these combine: a typical pipeline applies top-k or top-p to truncate the candidate pool, then temperature to control how sharply it samples from what remains. The caveat is that a fixed k ignores how confident the model is — it keeps k candidates whether the distribution is sharp or flat — which is why many setups favor top-p, and why these parameters affect each token emitted during [streaming](/glossary/token-streaming).

---

_Source: https://agentscamp.com/glossary/top-k — Term on AgentsCamp._
