Spec-Driven Development with AI Agents
Write the spec, let the agent implement against it — the SDD workflow (spec → plan → tasks → implement), when it beats prompt-and-iterate, and the tooling.
Spec-driven development inverts vibe coding: instead of steering an agent with corrective prompts, you invest in a written specification — requirements, constraints, acceptance criteria — and the agent implements against it, typically through a spec → plan → tasks → implement pipeline. It pays on substantial features and team settings; it's overhead for true exploration.
Key takeaways
- The core inversion: corrections move upstream — fixing a sentence in a spec is cheaper than redirecting an agent mid-implementation, and infinitely cheaper than refactoring accepted code.
- The canonical pipeline is spec → plan → tasks → implement, each an artifact you review — small, readable checkpoints instead of one giant diff.
- Specs make agent work parallelizable and team-legible: tasks with acceptance criteria can fan out to multiple agents/sessions and survive context resets.
- Use SDD for features with real surface area, integrations, and anything multiple sessions or people will touch; skip it for spikes and exploration where the spec would be fiction.
- The spec is living documentation: when behavior questions arise later, the answer exists in prose — the artifact vibe coding never leaves behind.
The second generation of agentic-coding wisdom is converging on something almost embarrassingly traditional: write down what you want before building it. Spec-driven development (SDD) is that discipline rebuilt for agents — where the spec isn't bureaucracy, it's the program you write in English, and the agent is its compiler.
The inversion
Vibe coding steers downstream: prompt, inspect behavior, correct, repeat — every correction spent after implementation exists. SDD moves the steering upstream, where it's cheap. The workflow that emerged as canonical — popularized by GitHub's Spec Kit — runs in reviewed stages:
- Specify. Author the spec: the problem, requirements, constraints, acceptance criteria. This is the thinking; expect it to be the hard part.
- Plan. The agent derives a technical plan — architecture, components, trade-offs — and you review that, killing wrong directions while they're still paragraphs.
- Tasks. The plan decomposes into discrete tasks with checkable outcomes — each sized for a focused session, each independently verifiable.
- Implement. Agents execute tasks against the spec, with tests embodying the acceptance criteria. Review arrives as small, purposeful diffs traceable to requirements.
Each stage gate is the same trick: review text before code, small before large. A flawed assumption caught in the plan costs a sentence; caught in the diff, a day.
Why agents specifically reward this
- Context is fragile; files aren't. Sessions compact, reset, and end — a spec on disk re-anchors any session, any agent, any teammate. It's the antidote to "the agent forgot what we agreed" (the memory mechanics).
- Parallelism needs contracts. Fan tasks out to parallel sessions or multiple agents and the spec is what keeps their outputs composable — shared intent, explicit interfaces.
- Acceptance criteria become tests. The spec's "done means…" lines translate directly into the verification that makes accepting agent code safe.
- The artifact outlives the change. Six months later, "why does it behave this way?" has a written answer. Vibe coding leaves behavior; SDD leaves intent.
When to spec — and when not to
SDD earns its overhead on work that outlives the session: features with real surface area, integrations with contracts, team codebases, anything multiple agents will touch. It's the wrong tool for exploration — spikes, prototypes, "what would this feel like" — where the honest spec doesn't exist yet and writing one is fiction. The mature loop uses both: vibe to discover, spec to build. Prototype freely; when the throwaway proves the idea, write the spec the prototype taught you, and let agents build the real one against it.
TIP
Already inside Claude Code, the lightweight on-ramp is built in: plan mode plus the plan-feature and breakdown-task commands give you spec→plan→tasks ergonomics for medium work without adopting a full toolkit; Spec Kit adds the standardized pipeline and constitution when the team wants the whole discipline.
The deeper point survives any particular tool: as implementation gets cheaper, specification becomes the engineering. The teams getting the most from agents in 2026 aren't the best prompters — they're the best at saying precisely what they want and proving they got it.
Frequently asked questions
- What is spec-driven development for AI agents?
- A workflow where the primary artifact you author is a specification — what to build, constraints, acceptance criteria — and the agent derives the implementation from it, usually via reviewed intermediate stages (technical plan, then task list, then code). You review small textual artifacts at each gate instead of steering live and reviewing one massive diff.
- How is this different from just writing a detailed prompt?
- Persistence, structure, and gates. A prompt is consumed once and lost to the scrollback; a spec is a file — versioned, refined, re-fed to fresh sessions, readable by teammates. And the staged pipeline inserts review where it's cheap: you approve the plan before code exists, so wrong directions die as paragraphs.
- What is Spec Kit?
- GitHub's open-source toolkit (September 2025) that productized the workflow: a CLI plus slash commands — specify, plan, tasks, implement — that work inside Claude Code, Copilot, Cursor, and other agents, with a 'constitution' file carrying project principles into every stage. It standardized the spec → plan → tasks → implement shape most teams now use.
- Does SDD slow you down?
- On small or exploratory work, yes — writing a spec for a spike is theater. On substantial features it's usually net-faster: the hour of spec writing replaces the day of corrective prompting and rework, parallelizes across agents, and prevents the architecture drift that costs the most later. The honest rule: spec when the work outlives the session.
Related
- Spec KitGitHub's open-source toolkit for spec-driven development — the specify CLI and /speckit slash commands that walk any coding agent from constitution to implementation.
- Vibe Coding in 2026: What It Is, When It Works, When It BitesAn honest guide to vibe coding — where prompt-and-accept development genuinely pays, where it accumulates risk, and the guardrails that make it professional.
- Plan FeatureExplore the codebase and produce an implementation plan for a feature.
- Breakdown TaskDecompose a task into an ordered checklist of small, verifiable steps.
- Building Multi-Step Agent WorkflowsPatterns for decomposing big tasks and coordinating multiple agents.
- Managing Claude Code Memory & Context: CLAUDE.md, /compact, and Auto-MemoryHow Claude Code remembers — every CLAUDE.md scope and load order, path-scoped rules, the auto-memory system, and the context commands that keep sessions sharp.
- Adr WriterWrite an Architecture Decision Record capturing a decision the user describes, in Michael Nygard ADR format (Status, Context, Decision, Consequences) with an added Considered Alternatives section. Use when recording a significant architectural or technology choice.
- TDD with AI Agents: Red-Green as an Agent LoopTest-driven development found its killer app: agents. How write-the-test-first turns AI coding into a verifiable loop, and the workflow that makes it stick.