AgentBuild constructs scientific agents from scientist-authored contracts
Researchers Woong Shin, Craig A. Bridges, and Marshall T. McDonnell propose AgentBuild, a framework that constructs LLM-based scientific agents from a version-controlled contract authored by the scientist, demonstrated on Rietveld refinement of X-ray diffraction data.
Score breakdown
AgentBuild shifts the durable artifact of scientific agent development from model-specific tuning to a scientist-authored contract, meaning workflow-scope failures become explicit contract failures and agent behavior can be re-tuned across model generations without a full rebuild.
- 01Authors: Woong Shin, Craig A. Bridges, and Marshall T. McDonnell (published 2026-06-11 on ArXiv).
- 02AgentBuild treats agent construction as a workflow stage, driven by a scientist-authored contract.
- 03The contract comprises a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base.
Woong Shin, Craig A. Bridges, and Marshall T. McDonnell argue that prevailing agent development practices — fine-tuning, reinforcement learning, and prompt-and-go — tend to obscure the scientist's judgment in the resulting agent. Their proposed remedy, AgentBuild, elevates agent construction to an explicit workflow stage. The scientist authors a "contract" consisting of three components: a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base. A rubric-driven judge then gates a meta-optimizer coding agent that edits the scientific agent within boundaries declared in the contract, so the build process encodes the scientist's judgment rather than bypassing it.
The paper instantiates AgentBuild for Rietveld refinement of X-ray diffraction data, using GSAS-II exposed through MCP and A2A protocols.
The paper instantiates AgentBuild for Rietveld refinement of X-ray diffraction data, using GSAS-II exposed through MCP and A2A protocols. A blank-harness construction run progresses through a lithium lanthanum zirconium oxide (LLZO) signal-to-noise ladder, reaching a 4-hour scan as a frontier case and surfacing the workflow-scope limits that remain. Crucially, the rubric rewards credible fits and also scores trajectory scope, so reaching the frontier manifests as a contract failure rather than a pattern-fitting failure — making the boundary explicit and auditable.
The authors highlight a key durability property: as base models evolve, re-running AgentBuild constitutes a re-tune rather than a full rebuild, with the scientist's authored contract remaining the stable, long-lived asset across model generations.
Key facts
- 01Authors: Woong Shin, Craig A. Bridges, and Marshall T. McDonnell (published 2026-06-11 on ArXiv).
- 02AgentBuild treats agent construction as a workflow stage, driven by a scientist-authored contract.
- 03The contract comprises a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base.
- 04A rubric-driven judge gates a meta-optimizer coding agent that edits the scientific agent within declared boundaries.
- 05The framework is demonstrated on Rietveld refinement of X-ray diffraction data using GSAS-II, exposed via MCP and A2A protocols.
- 06A blank-harness construction run progresses through an LLZO signal-to-noise ladder, reaching a 4-hour scan as a frontier case.
- 07When base models evolve, re-running AgentBuild is described as a re-tune rather than a rebuild, with the contract as the durable asset.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →