Paper proposes formal definition of "agent harness" for coding AI
A paper by Sanderson Oliveira de Macedo proposes a constitutive definition — necessary and sufficient conditions — for the term "agent harness," distinguishing it from related concepts like agent frameworks, SDKs, IDE plugins, eval harnesses, and orchestrators.
Score breakdown
The paper provides the first operational definition of "agent harness" with a shared vocabulary, enabling consistent engineering practice and scientific comparison of agentic coding systems.
- 01The term 'agent harness' is used inconsistently — sometimes denoting a whole product, sometimes an eval scaffold, sometimes conflated with frameworks, SDKs, or orchestrators.
- 02The paper proposes a constitutive definition stating the necessary and sufficient conditions for a system to be an agent harness.
- 03The definition is operationalized as an inclusion and exclusion test.
The term "agent harness" has proliferated in software engineering with generative AI, but its usage is loose and polysemous. It can refer to an entire product (e.g., Claude Code, Codex CLI), an evaluation scaffold that runs an agent against tasks (e.g., the SWE-bench harness), or be conflated with agent frameworks, SDKs, IDE plugins, and orchestrators. Sanderson Oliveira de Macedo's paper argues that what is missing is a reference definition that includes and excludes cases consistently, and sets out to build one.
The paper constructs its definition through conceptual analysis drawing on works with persistent identifiers and primary grey-literature sources such as official documentation, glossaries, and engineering reports.
The paper constructs its definition through conceptual analysis drawing on works with persistent identifiers and primary grey-literature sources such as official documentation, glossaries, and engineering reports. It first reconstructs the genealogy of the term — tracing it from horse tack, to the classic software test harness, to the machine-learning evaluation harness, and finally to the agent harness. From this foundation, the paper proposes a constitutive definition stating the necessary and sufficient conditions for a system to qualify as an agent harness, and operationalizes it as an explicit inclusion and exclusion test.
The definition is then used to draw clear conceptual boundaries against five adjacent concepts: agent framework, agent SDK, IDE plugin, eval harness, and orchestrator. To validate the definition, the paper applies the inclusion/exclusion test to six real-world systems — Claude Code, Codex CLI, Aider, Cline, OpenHands, and SWE-agent — as well as deliberate edge cases, demonstrating consistent results. The paper concludes with a research agenda organized by design tension axes, positioning the contribution as a shared vocabulary to guide both engineering practice and the scientific comparison of agentic systems.
Key facts
- 01The term 'agent harness' is used inconsistently — sometimes denoting a whole product, sometimes an eval scaffold, sometimes conflated with frameworks, SDKs, or orchestrators.
- 02The paper proposes a constitutive definition stating the necessary and sufficient conditions for a system to be an agent harness.
- 03The definition is operationalized as an inclusion and exclusion test.
- 04The genealogy of the term is traced from horse tack → classic test harness → ML evaluation harness → agent harness.
- 05The definition is validated against six real systems: Claude Code, Codex CLI, Aider, Cline, OpenHands, and SWE-agent.
- 06Conceptual boundaries are drawn against five adjacent concepts: agent framework, agent SDK, IDE plugin, eval harness, and orchestrator.
- 07The paper closes with a research agenda organized by design tension axes.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 10, 2026 · 15:34 UTC. How this works →