# Prompt Caching

> Prompt caching reuses the computed state of a repeated prompt prefix across requests — dramatically cutting cost and time-to-first-token for stable context.

**Prompt caching is reusing the computation for a repeated prompt prefix across API requests: the provider stores the model's processed state for the stable beginning of your prompt, so subsequent requests pay full price only for what's new.**

It exploits how [inference](/glossary/inference) works — processing a prompt builds an internal [KV cache](/glossary/kv-cache); if the next request begins with an identical prefix, that state is reusable. Providers expose this with large discounts on cached input tokens and sharply reduced time-to-first-token. For applications with heavy stable context — long [system prompts](/glossary/system-prompt), tool schemas, agent scaffolding, documents queried repeatedly — it's routinely the single biggest cost lever available, which is why agentic tools like Claude Code lean on it constantly.

The engineering is all *prefix discipline*: stable content first, volatile content last, byte-exact consistency (no timestamps, no reordered JSON keys upstream of the cache point), and TTL awareness so steady traffic keeps caches warm. Restructuring a call for maximum hit rate is precisely what the [prompt-cache-optimizer](/skills/performance/prompt-cache-optimizer) skill does, inside the broader [cost and latency playbook](/guides/advanced/llm-cost-latency-engineering).

---

_Source: https://agentscamp.com/glossary/prompt-caching — Term on AgentsCamp._
