# Why Your Agent Loops: Debugging AI Agents

> The recurring agent failure modes — loops, premature victory, tool misuse, context poisoning, scope creep — diagnosed by their signatures, with fixes.

Agent failures are systematic, not random. Loops mean the agent can't perceive progress — fix the feedback. Premature 'done' means no verifiable success signal (make completion checkable). Tool misuse means routing-by-description failed (sharpen names and descriptions). Context poisoning means an early wrong fact keeps steering (checkpoint and restart clean). Diagnose by signature; fix the class.

Agent failures look chaotic — forty turns of confident wrongness — but they're systematic underneath. A handful of failure modes account for nearly everything, each with a recognizable **signature in the trace** and a fix at the *class* level. Debugging agents is mostly learning to read those signatures. (This page covers agents you're *building*; for Claude Code product issues, see [its troubleshooting guide](/guides/troubleshooting/claude-code-troubleshooting).)

## The loop

**Signature:** the same tool call (or trivial variations) repeating; token spend climbing; no state change. **Cause:** broken feedback — the agent either can't tell the action failed (uninformative errors) or can't tell it succeeded (no confirmation in the observation), so its policy never updates. **Fix, in order:** make tools return *informative, differentiated* errors and *changed-state confirmation* ([the tool-calling discipline](/guides/concepts/production-tool-calling)); add loop detection (identical call twice → inject "this approach is failing, change strategy"); cap turns as the backstop, treating the cap as a failed run to diagnose, not retry blindly.

## Premature victory

**Signature:** "I've successfully…" over work that doesn't compile, a test never run, a file never written. **Cause:** no verifiable completion condition — optimism fills the vacuum. **Fix:** define done as something executable and *require the agent to run it* before concluding ("done = `npm test` exits 0"). For tasks without a natural check, bolt one on: a [fresh-context critic](/guides/advanced/multi-agent-orchestration) judging output against the original ask catches the self-grading bias.

## Tool misuse

**Signature:** right intention, wrong tool — or tools ignored in favor of training-data guesses. **Cause:** routing happens by matching intent against tool names/descriptions; overlap and vagueness break it. **Fix:** [disjoint, use-when/not-when descriptions](/guides/prompting/effective-tool-use), verb-object names, fewer overlapping tools. The test: read only your tool list — if *you* couldn't route correctly from it, the model can't.

## Context poisoning

**Signature:** an early wrong fact ("the DB is MySQL") restated turn after turn, surviving corrections, steering everything. **Cause:** transformer attention treats repeated context as established truth; corrections compete with N restatements. **Fix:** don't argue — **restart**. Checkpoint before long runs, and on poisoning, clear and relaunch with the correction stated *first*. Prevention: pass constraints explicitly at task start (subagents inherit nothing), and persist ground truth to files the agent re-reads rather than trusting conversational memory.

## Scope creep and the silent extras

**Signature:** the task done — plus a refactor nobody asked for, three "improved" files, a new dependency. **Cause:** helpfulness bias plus visible-but-irrelevant context. **Fix:** explicit boundaries in the task ("change only X; do not touch Y"), [permission rules](/guides/configuration/claude-code-settings-permissions) that make out-of-scope edits impossible, and diff review that treats unrequested changes as defects regardless of quality.

## Make it stay fixed

Single-run fixes decay; the durable loop is: **trace → classify → fix the class → encode the lesson** — the tool description sharpened, the constraint added to the standing prompt, the verification step made mandatory, the [eval case](/guides/evaluation/write-llm-evals) added so the regression is caught next time. That discipline — failure modes as a checklist applied *before* production — is exactly what the [agent-reliability-reviewer](/agents/meta-orchestration/agent-reliability-reviewer) automates.

---

_Source: https://agentscamp.com/guides/troubleshooting/debugging-ai-agents — Guide on AgentsCamp._
