Context Engineering
Context engineering is the discipline of curating exactly what enters an LLM's context window so it has the right information and nothing else.
Context engineering is the practice of deliberately curating what goes into an LLM's context window — instructions, retrieved data, tool results, and history — so the model has exactly the information it needs and nothing extraneous.
It has become the central discipline for building agents, largely superseding "prompt engineering" as the unit of work. A long-running agent's context is assembled dynamically across many turns: system instructions, results from RAG retrieval, outputs from tools, and prior conversation. Deciding what to include, what to summarize, and what to drop is what separates a reliable agent from one that drifts or stalls.
It matters because context is a budget, not a free pile: every token costs latency and money, and model attention dilutes over long inputs so buried facts go unused. The practical craft is loading the relevant slice rather than everything — retrieving instead of dumping, compacting old turns, and trimming tool output to the essentials. The tradeoff is engineering effort: good context assembly takes work, but a focused window consistently outperforms a stuffed one. For the full discipline, see the context engineering guide.
Frequently asked questions
- How is context engineering different from prompt engineering?
- Prompt engineering focuses on wording a single instruction well. Context engineering is broader: it's about deciding everything that goes into the window — instructions, retrieved data, tool outputs, and conversation history — and what to leave out. As applications became multi-step agents, what's in context started to matter more than how the prompt is phrased, so context engineering supersedes prompt engineering for agentic work.
- Why not just put everything in the context window?
- Because more context isn't better context. Every token costs money and latency, and models attend less reliably to information buried in long inputs, so relevant facts get lost amid noise. Loading only the right slice — through retrieval, summarization, and selective tool results — reliably beats stuffing the window full.
Related
- Context WindowThe context window is the maximum text — measured in tokens — an LLM can consider at once: prompt, conversation, documents, and its own output combined.
- RAG (Retrieval-Augmented Generation)RAG retrieves relevant documents from your own data and injects them into an LLM's prompt at query time, grounding answers in facts the model wasn't trained on.
- Token (LLM)A token is the unit LLMs read and write — a word fragment of roughly 3–4 characters in English. Models are priced, limited, and measured in tokens, not words.
- Extended ThinkingExtended thinking is the reasoning tokens a model generates before its final answer, trading latency and cost for higher accuracy on hard problems.
- LLM Context Windows Compared (2026)Context windows and max output tokens across Claude, GPT, Gemini, DeepSeek, and Grok — the million-token era, what it costs, and what fits in practice.
- Agent Memory Architecture: Short-Term, Long-Term, and When to Use EachHow AI agents remember — working memory vs. persistent long-term memory, what to store, how to retrieve it, and how to keep context small.
- RAG vs Long Context: Do Million-Token Windows Kill Retrieval?Million-token context windows promised the end of RAG. The honest 2026 answer: long context changed where retrieval starts paying, not whether it does.
- Managing Claude Code Memory & Context: CLAUDE.md, /compact, and Auto-MemoryHow Claude Code remembers — every CLAUDE.md scope and load order, path-scoped rules, the auto-memory system, and the context commands that keep sessions sharp.
- CLAUDE.md Best PracticesHow to write a CLAUDE.md that actually helps — what to include, what to leave out, and how to keep it current.
- The AI Engineer Roadmap for 2026A staged path from API calls to production agents — the skills that matter in 2026, what to skip, and the guides and tools for each stage, in order.
- Designing System Prompts for LLM Apps and AgentsHow to write system prompts that hold up in production: what belongs there vs. the user turn, structure that survives long context, and format/refusal rules.
- Prompt Patterns for Coding AgentsPractical prompting patterns: chaining, few-shot, context management, tool use, and output structuring.
- Few-Shot vs Chain-of-Thought vs Structured Prompting: What to Use When (2026)When to reach for few-shot examples, chain-of-thought reasoning, or structured/output-constrained prompting — a 2026 decision guide to the core techniques.
- Prompt EngineeringPrompt engineering is the practice of designing an LLM's inputs — instructions, context, examples, and format — to reliably get the output you want.