Agent Harness

An agent harness is everything around the model that turns it into a working agent: the execution loop, tool definitions, context management, permissions, error handling, and recovery — the machinery that converts model decisions into safe, observed actions.

The term sharpened as the industry learned that model quality and agent quality are different axes. A harness determines what the model sees each turn (context assembly, compaction, memory), what it can do (tools and their schemas), what it may do (permissions and gates), and how failures feed back (errors as observations, retries, loop detection). Identical models in different harnesses produce visibly different agents — which is why coding-agent comparisons increasingly evaluate model+harness pairs, and why Claude Code's edge is co-tuning both sides.

The word now anchors real decisions: adopt a harness (Claude Code, OpenCode, Letta Code — the comparison axis), build on one (the Claude Agent SDK is "the harness as a library"), or assemble your own from frameworks. And it names the discipline around it: agent engineering is, in one phrase, harness engineering.

Frequently asked questions

What's the difference between the model and the harness?

The model decides; the harness is everything that lets deciding become doing — the execution loop, tool definitions and dispatch, context assembly and compaction, permissions, error feedback, retries, and state. Two products on the same model can perform wildly differently: that gap is harness quality.

Why did 'harness' become a 2026 term of art?

Because benchmarks and practice both started isolating it: the same model scores differently across coding harnesses, agent products compete on harness engineering (Terminal-Bench explicitly ranks model+harness pairs), and labs now co-train models against their own harnesses. Once the model is a commodity choice, the harness is the product.

Frequently asked questions

Related