Harness engineering builds trust loops for coding agents
A Thoughtworks Distinguished Engineer proposes a mental model called "harness engineering" to help coding agent users increase confidence in AI-generated code through feedforward guides and feedback sensors.
Score breakdown
The framework gives coding agent users a structured vocabulary and design approach for reducing review toil and improving output quality without relying solely on the agent's built-in tooling.
- 01Author Birgitta Böckeler is a Distinguished Engineer and AI-assisted delivery expert at Thoughtworks with over 20 years of software experience.
- 02The article defines the agent formula as: Agent = Model + Harness.
- 03Coding agents have a built-in harness (system prompt, code retrieval, orchestration) plus an outer harness users can build for their specific system.
Birgitta Böckeler, a Distinguished Engineer and AI-assisted delivery expert at Thoughtworks, published this article on April 2, 2026, as an update to an earlier memo on harness engineering. She defines "harness" in the bounded context of coding agents as everything in an agent except the model itself — captured in the formula `Agent = Model + Harness` — and then narrows that definition further to focus on the outer harness that users construct around an existing coding agent tool.
Computational controls are deterministic and fast — tests, linters, type checkers, and structural analysis running in milliseconds to seconds with reliable results.
The article's central argument is that a well-built outer harness serves two goals: increasing the probability that the agent produces correct results on the first attempt, and providing a self-correcting feedback loop that resolves as many issues as possible before a human reviewer sees them. To achieve this, Böckeler distinguishes two control types: feedforward "guides" that anticipate and steer agent behaviour before it acts, and feedback "sensors" that observe outputs afterward and signal the agent to self-correct. She notes that sensors are especially powerful when their output is optimised for LLM consumption — for example, custom linter messages that include self-correction instructions, which she describes as "a positive kind of prompt injection."
Guides and sensors each fall into one of two execution categories. Computational controls are deterministic and fast — tests, linters, type checkers, and structural analysis running in milliseconds to seconds with reliable results. Inferential controls involve semantic analysis or "LLM as judge" approaches, running on a GPU or NPU; they are slower, more expensive, and more non-deterministic. The article further organises harness components into regulation categories including a maintainability harness, an architecture fitness harness, and a behaviour harness, and introduces the concept of "harnessability" — how amenable a given codebase or system is to being harnessed. The source text is truncated before the full treatment of those later sections.
Key facts
- 01Author Birgitta Böckeler is a Distinguished Engineer and AI-assisted delivery expert at Thoughtworks with over 20 years of software experience.
- 02The article defines the agent formula as: Agent = Model + Harness.
- 03Coding agents have a built-in harness (system prompt, code retrieval, orchestration) plus an outer harness users can build for their specific system.
- 04Feedforward 'guides' steer agent behaviour before it acts; feedback 'sensors' observe outputs and help the agent self-correct.
- 05Computational controls (tests, linters, type checkers) run in milliseconds to seconds and produce reliable, deterministic results.
- 06Inferential controls (semantic analysis, AI code review, 'LLM as judge') run on GPU/NPU and are slower, more expensive, and non-deterministic.
- 07Custom linter messages that include self-correction instructions are described as 'a positive kind of prompt injection.'
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 13, 2026 · 08:58 UTC. How this works →