Context Engineering
Treating the context window as a finite budget — what to load, what to leave out, and when to reset.
Every token an agent reads competes for the same finite window. Fill it with the right three files and Claude reasons sharply; fill it with a git ls-files dump and the signal you actually care about gets buried under noise the model still has to weigh. Context engineering is the discipline of spending that budget deliberately — deciding what loads, what stays out, and when to clear the table and start fresh.
This isn't prompt wording. A perfectly phrased request still fails if the surrounding context is full of stale diffs, irrelevant files, and a 40-turn transcript the model now has to reconcile. The window is the workspace; this guide is about keeping it clean.
The window is a budget, not a backpack
Long context does not mean infinite attention. As a window fills, two things degrade together: the model's ability to find the relevant fact among everything else, and its tendency to lose facts buried in the middle of a long window while over-weighting whatever appears at the very start and end. A 200K window holds a lot, but the useful question is never "does it fit" — it's "does adding this make the next answer better or worse."
Treat every addition as a withdrawal from a budget:
| Spend on | Don't spend on |
|---|---|
| The 2–4 files the task actually touches | The whole repo "for context" |
| Exact function signatures and types in play | Generated code, lockfiles, dist/ |
| The one error message you're debugging | Twenty pages of passing test logs |
Durable conventions (in CLAUDE.md) | Conventions re-pasted every prompt |
NOTE
Signal-to-noise is the metric that matters, not raw token count. A focused 8K-token context routinely outperforms a 120K-token context that happens to contain the same answer somewhere inside it.
Signal vs. noise
Signal is anything the model needs to reason about this task and couldn't reliably infer. Noise is everything else that happens to be in the window — and noise is not neutral. It costs attention, invites the model to "fix" things you didn't ask about, and dilutes the instructions that matter.
Common noise that sneaks in:
- Whole-file reads when you needed one function. Reading a 1,200-line module to discuss one method drags 1,150 irrelevant lines along.
- Stale transcript. A finished migration discussion three tasks ago is still being re-read on every turn.
- Build and test output. Most of a CI log is reassurance you didn't need; the three failing lines are the signal.
- Defensive over-pasting. Dumping a schema, a config, and two utils "just in case" when the task touches one of them.
The fix is to point precisely. Reference exact paths and symbols — src/billing/invoice.ts, computeTax() — so the agent reads the slice that matters instead of grepping the tree and pulling in everything adjacent.
CLAUDE.md vs. the prompt
The most common context mistake is putting durable facts in prompts and ephemeral facts in CLAUDE.md. It should be the reverse.
CLAUDE.md loads on every session, so it's the right home for things that are true across all tasks and expensive to re-explain: the package manager, the test command, directory layout, naming conventions, the "never do X" rules. Earn its place — every line here is paid for on every single turn, forever.
The prompt is for what's true about this task only: which files, which bug, which acceptance criteria.
# CLAUDE.md — durable, every session
- Package manager: pnpm (never npm)
- Tests: `pnpm test`; typecheck: `pnpm typecheck`
- Routes are Server Components by default; mark client with "use client"
- Never edit files in `generated/` — they're rebuilt from schema.prisma# Prompt — ephemeral, this task only
Fix the off-by-one in pagination in src/api/list.ts (getPage).
Reproduce with `pnpm test list.test.ts`, then make it pass.TIP
A good test for any CLAUDE.md line: would you be annoyed to re-type it for the tenth time this week? If yes, it belongs there. If it's specific to today's task, it belongs in the prompt and should leave the window when the task ends.
WARNING
A bloated CLAUDE.md is a permanent tax. A 600-line one that documents every edge case burns budget on every turn and buries the ten rules that matter. Keep it tight; link out to docs the agent can read on demand instead of inlining them.
Scope the task so it loads only what it needs
How you frame a task determines what the agent pulls into the window. A vague ask forces exploration — grepping, listing, reading whole files to orient — and every byte of that exploration stays resident. A scoped ask lets it go straight to the relevant slice.
Compare:
# Unscoped — agent reads half the repo to figure out where things are
"The checkout total is sometimes wrong, can you fix it?"
# Scoped — agent loads exactly what it needs
"In src/cart/total.ts, applyDiscount() double-counts percentage coupons
when two stack. Read that function and its test, fix the math, keep the
signature."Both can land the fix. The first does it after dragging routing, components, and three unrelated utils into context; the second never loads them. Scoping is the cheapest context optimization you have — it costs one extra sentence and saves thousands of tokens of noise.
When you genuinely don't know where the problem lives, make discovery the explicit first step and keep it cheap: "List the files that touch checkout totals; don't read them yet." Then scope the real task to the answer.
Clear and compact: resetting the window
Context accumulates whether or not it's still useful. The two levers for resetting it are /clear and /compact, and they're for different situations.
/clearwipes the conversation and starts fresh. Reach for it at task boundaries — you finished the migration, now you're fixing an unrelated bug. The old transcript has zero value to the new task and is pure noise; clearing it is the single highest-leverage habit in this guide./compactsummarizes the conversation so far into a compact form and continues, preserving the thread. Use it mid-task when a long, relevant session is getting heavy but you still need its conclusions — a multi-step refactor where the decisions made early still matter.
TIP
Default to /clear between unrelated tasks rather than letting one mega-session run all day. A fresh window with a sharp prompt beats a stale window that technically "remembers everything" but spends attention reconciling ten tasks' worth of history.
A useful tell: if you find yourself re-explaining what you're doing because the agent seems to be acting on something from an hour ago, the window is overloaded. Clear it and restate the current task cleanly.
Delegate to subagents to isolate context
Subagents are a context tool as much as a delegation tool. Each one runs in its own separate window and returns only its final summary to your main thread. That means a noisy investigation — reading twenty files, running the suite, trawling logs — happens somewhere else, and your main conversation receives the distilled answer instead of the mess that produced it.
---
name: failure-investigator
description: Reproduces a failing test, finds the root cause, reports back. Use when a test fails and you need the cause, not a fix.
model: sonnet
tools: Read, Grep, Glob, Bash
---
Reproduce the failing test, trace the root cause, and report:
the failing assertion, the responsible file and line, and the most
likely cause in 2–3 sentences. Do not edit code.The subagent might read fifteen files and a hundred lines of stack trace to do its job. Your main window never sees any of it — only the three-sentence verdict comes back. This is how you investigate a hard bug without poisoning the context you'll use to actually fix it. (For picking the right model per delegate, see choosing-the-right-model.)
Why dumping the whole codebase backfires
The instinct that "more context can't hurt" is wrong, and it fails in three concrete ways.
- Attention dilutes. The model weighs everything in the window. Hand it 80 files when 3 are relevant and the signal-to-noise ratio collapses; the right answer is in there, but so is everything that distracts from it.
- It invites scope creep. An agent that can see your entire repo will helpfully "improve" files you never mentioned — touching code you didn't want changed because it was simply there.
- It crowds out the work. Tokens spent on irrelevant files are tokens not available for reasoning, the diff, and your follow-ups. You hit the ceiling faster and get shorter, shallower answers.
A whole-repo dump feels thorough but trades a small, sharp working set for a large, blurry one. The skill is curation, not accumulation.
Concrete dos and don'ts
Do
- Name exact files and symbols; let the agent read the slice, not the tree.
- Put durable, cross-task facts in
CLAUDE.md; keep it lean. /clearat every task boundary.- Push noisy investigations into subagents and keep only their summaries.
- Scope the request so discovery is cheap or unnecessary.
Don't
- Paste an entire file to discuss one function.
- Leave a finished task's transcript sitting in the window for the next one.
- Re-explain conventions every prompt instead of writing them down once.
- Dump the repo "for context" and hope the model sorts it out.
- Let one session run all day across five unrelated tasks.
Context is the one resource you control completely on every turn. Spend it on signal, evict noise the moment it stops paying rent, and reset without hesitation when the task changes — and the model will reward you with sharper, more reliable work on a smaller, cleaner window.
Related
- Prompt Patterns for Coding AgentsPractical prompting patterns: chaining, few-shot, context management, tool use, and output structuring.
- Choosing the Right Model: Haiku vs Sonnet vs OpusHow to pick the right Claude model tier for an agent or task.
- Agent Memory Architecture: Short-Term, Long-Term, and When to Use EachHow AI agents remember — working memory vs. persistent long-term memory, what to store, how to retrieve it, and how to keep context small.
- CLAUDE.md Best PracticesHow to write a CLAUDE.md that actually helps — what to include, what to leave out, and how to keep it current.
- Few-Shot vs Chain-of-Thought vs Structured Prompting: What to Use When (2026)When to reach for few-shot examples, chain-of-thought reasoning, or structured/output-constrained prompting — a 2026 decision guide to the core techniques.