HarnessX foundry evolves agent scaffolding from execution traces
HarnessX is a new agent harness foundry that automatically composes and evolves prompts, tools, memory, and control flow from execution traces, yielding an average +14.5% gain across five benchmarks without scaling the underlying model.
Score breakdown
HarnessX demonstrates that evolving the runtime scaffolding around a model — rather than scaling the model itself — can deliver substantial benchmark gains, offering a complementary path to agent improvement that does not require larger or more expensive models.
- 01HarnessX is a foundry for composable, adaptive, and evolvable agent harnesses, introduced by Tingyang Chen, Shuo Lu, and Kang Zhao.
- 02It assembles typed harness primitives (prompts, tools, memory, control flow) via a substitution algebra.
- 03Adaptation is handled by AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning.
Tingyang Chen, Shuo Lu, and Kang Zhao argue that AI agent performance depends critically on the runtime harness — the prompts, tools, memory, and control flow that shape how a model observes, reasons, and acts — yet today's harnesses remain largely hand-crafted and static. Each new model or task still demands bespoke scaffolding, and the rich execution traces produced at runtime are rarely fed back into systematic improvement. HarnessX is their proposed solution: a foundry that treats harness construction as a principled engineering problem rather than an artisanal one.
Crucially, the system closes the harness-model loop by converting execution trajectories into both harness updates and model training signal, meaning improvements compound over time rather than remaining one-shot.
HarnessX works by assembling typed harness primitives through a substitution algebra, then adapting them via AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning. Crucially, the system closes the harness-model loop by converting execution trajectories into both harness updates and model training signal, meaning improvements compound over time rather than remaining one-shot.
Evaluated across five benchmarks — ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified — HarnessX delivers an average gain of +14.5%, with individual gains reaching as high as +44.0%, with the largest improvements occurring where baseline performance is lowest. The authors frame this as evidence that agent progress need not come from model scaling alone, positioning harness composition and evolution from execution feedback as a complementary lever. The complete codebase is planned for open-source release in a future update.
Key facts
- 01HarnessX is a foundry for composable, adaptive, and evolvable agent harnesses, introduced by Tingyang Chen, Shuo Lu, and Kang Zhao.
- 02It assembles typed harness primitives (prompts, tools, memory, control flow) via a substitution algebra.
- 03Adaptation is handled by AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning.
- 04HarnessX closes the harness-model loop by turning execution trajectories into both harness updates and model training signal.
- 05Evaluated on five benchmarks: ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified.
- 06Average performance gain is +14.5%, with a peak gain of +44.0%; gains are largest where baselines are lowest.
- 07The complete codebase will be open-sourced in a future release.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 15, 2026 · 11:57 UTC. How this works →