SIGA adapter turns coding agents into scientific simulator operators
Researchers introduce SIGA, a Simulator-Interface Grounding Adapter that equips off-the-shelf coding agents with the vocabulary, structural constraints, and validation rules needed to autonomously configure complex scientific simulators like GEOS, OpenFOAM, and LAMMPS.
Score breakdown
The paper demonstrates that a lightweight, self-improvable grounding layer — rather than full retraining — is sufficient to turn a general coding agent into a practical operator of real scientific simulators, reducing a multi-hour human setup task to minutes.
- 01SIGA stands for Simulator-Interface Grounding Adapter and supplies a simulator's vocabulary, structural constraints, validation rules, and termination conditions to a coding agent.
- 02SIGA is evaluated primarily on GEOS, an open-source multiphysics simulator used in subsurface science.
- 03SIGA produces a complete GEOS deck in about five minutes with a TreeSim score above 0.90, versus roughly three hours for a human expert — a ~36x wall-clock speedup.
Matthew Ho, Brian Liu, and Jixuan Chen frame scientific simulator setup as an "agent-tool interface grounding" problem: coding agents already possess general skills like file navigation, code editing, command execution, and output repair, but they lack the simulator-specific knowledge needed to produce valid configurations. SIGA addresses this by supplying a Simulator-Interface Grounding Adapter that encodes a simulator's executable contract through four mechanisms: retrieval, procedural memory, in-trajectory validation, and validation-enforced termination.
The primary evaluation target is GEOS, an open-source multiphysics simulator used in subsurface science.
The primary evaluation target is GEOS, an open-source multiphysics simulator used in subsurface science. On the main benchmark, SIGA produces a complete GEOS deck in approximately five minutes with a TreeSim score above 0.90, matching an extended-budget human expert who required about three hours — a roughly 36x wall-clock speedup. On a harder held-out set, grounding raises TreeSim from 0.720 to 0.789, a roughly 10% relative gain over the bare agent, and reduces across-seed standard deviation by 16x. A self-evolution component, which rewrites adapter contents from prior agent trajectories, yields the highest held-out GEOS mean and matches or outperforms the strongest hand-designed configuration.
Transfers to OpenFOAM and LAMMPS reveal that the dominant mechanism shifts depending on the interface: validation matters most when structural completeness is the bottleneck, while memory and retrieval matter most when domain correctness is the bottleneck. The authors conclude that lightweight, self-improvable grounding layers can turn general coding agents into practical operators of scientific software.
Key facts
- 01SIGA stands for Simulator-Interface Grounding Adapter and supplies a simulator's vocabulary, structural constraints, validation rules, and termination conditions to a coding agent.
- 02SIGA is evaluated primarily on GEOS, an open-source multiphysics simulator used in subsurface science.
- 03SIGA produces a complete GEOS deck in about five minutes with a TreeSim score above 0.90, versus roughly three hours for a human expert — a ~36x wall-clock speedup.
- 04On a harder held-out set, grounding raises TreeSim from 0.720 to 0.789, a roughly 10% relative gain over the bare agent.
- 05Grounding reduces across-seed standard deviation by 16x on the held-out set.
- 06A self-evolution mechanism rewrites adapter contents from prior trajectories, achieving the highest held-out GEOS mean and matching or outperforming the strongest hand-designed configuration.
- 07Transfers to OpenFOAM and LAMMPS show the dominant mechanism shifts by interface: validation matters most for structural completeness bottlenecks; memory and retrieval matter most for domain correctness bottlenecks.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →