# Test-Time Compute

> Test-time compute is spending more computation at inference — longer reasoning, sampling, or search — to improve answers without retraining the model.

**Test-time compute is the strategy of spending more computation at inference time — generating longer reasoning, sampling many candidate answers, or searching over solutions — to get better results from a fixed model without retraining it.**

It's the scaling axis behind [reasoning models](/glossary/reasoning-model). For years, gains came almost entirely from training larger models on more data; test-time compute showed that a model can also improve simply by being given more room to work at the moment it answers. In practice that means extended [chain-of-thought](/glossary/chain-of-thought) reasoning (see [extended thinking](/glossary/extended-thinking)), drawing multiple samples and aggregating them, or running a search procedure over candidate steps.

This matters because it's tunable per query: hard problems get more compute, easy ones get less, and you can buy accuracy on demand rather than retraining. The tradeoff is cost and latency — every extra reasoning token or sampled candidate is paid for at inference, and returns diminish past a point. Beyond some budget, more thinking stops helping and just slows the response, so test-time compute is a dial to set against task difficulty, not a constant to crank.

---

_Source: https://agentscamp.com/glossary/test-time-compute — Term on AgentsCamp._