# How to Test AI-Generated Code

> AI writes the code; tests decide whether to trust it. The verification stack for agent-written changes — contracts, generated tests, and the review that's left.

When AI writes the code, tests stop being quality assurance and become the acceptance contract — the thing that makes accepting a diff safe. The working stack: define done as a test before the agent starts, let agents generate broad coverage but review the assertions, keep mutation-level skepticism for critical paths, and reserve humans for what tests can't see — intent, security, design.

The uncomfortable math of 2026: AI writes a huge share of new code, and nobody — not even the diligent — reads all of it the old way. That isn't a scandal; it's a redefinition. **Verification, not authorship, is now the engineering**, and tests are its primary instrument. Here's how testing changes when the code under test came from an agent.

## Tests become the contract, not the afterthought

With human code, tests trail implementation and catch slips. With agent code, the high-leverage move is inversion: **define "done" as an executable test before the agent starts.** "Implement rate limiting — done means `rate-limit.test.ts` passes, including the burst and clock-skew cases" turns acceptance from vibes into a checkable artifact — and you review the *test* (twenty readable lines of intent) instead of pretending to review three hundred lines of diff. This is the practical core of making [vibe-speed development](/guides/prompting/vibe-coding-guide) safe, and it's the agent-era version of [TDD](/guides/testing/tdd-with-ai-agents).

## The self-grading trap

The signature failure mode: one context writes both implementation and tests, so a misunderstanding lands in both, and the suite turns green around the wrong behavior. Defenses, in increasing strength:

- **Read the assertions.** Always. They're small, and they're where misunderstanding shows.
- **Anchor with your own defining test** — even one — written from the requirement, not the diff.
- **Blind-test the diff**: a separate session (or the [test-engineer](/agents/quality-security/test-engineer) agent) gets the requirements and the code, *not* the implementer's reasoning, and writes tests from spec. Disagreement between suites is signal, exactly like a [fresh-eyes critic](/guides/advanced/multi-agent-orchestration).

## What agents test well — and what you must add

Let agents do what they're excellent at: **breadth.** Edge cases humans skip (empty inputs, unicode, boundary values), table-driven case generation, regression scaffolds around legacy code ([test-scaffolder](/skills/testing/test-scaffolder) and [write-tests](/commands/testing/write-tests) package this). What they don't know is **what matters** — which behaviors carry the business, which invariants are sacred, which failure would page someone. That's the human contribution: a handful of assertions encoding intent, ranked effort via [coverage-gap-finder](/skills/testing/coverage-gap-finder) on the paths that count, and **mutation-level skepticism** on critical code — break the implementation deliberately and confirm the suite notices. A suite that can't fail is documentation cosplay.

## The residue humans still own

Tests verify the contract; they're blind to whole categories an agent can get wrong while staying green: **security that functions** (injection with correct output — run [security review](/commands/review/security-scan) as its own pass), **performance under load**, **architecture** (extensibility, coupling, the month-six bill), and **quiet scope creep** — code that does more than asked. That's the rubric for the human pass in your [review workflow](/guides/workflow/ai-code-review-workflow): skip re-deriving what tests already prove; spend entirely on what they can't see.

The summary discipline fits on a sticky note: **before** — a test defines done; **during** — the agent iterates against it; **after** — read assertions, scan security, judge design. Code volume scaled with AI; this is how confidence scales with it.

---

_Source: https://agentscamp.com/guides/testing/testing-ai-generated-code — Guide on AgentsCamp._