Self-evolving LLM agent discovers interpretable fluid control policy
Researchers present a self-evolving scientific-agent workflow driven by large language models and iterative code generation that autonomously constructs interpretable controllers for complex physical systems, demonstrated on a nonlinear fluid-structure interaction problem.
Score breakdown
The work demonstrates that an autonomous LLM-driven agent can produce physically interpretable, generalizable control policies through a fully auditable discovery process — without the black-box weight optimization that typically makes deep reinforcement learning opaque in scientific contexts.
- 01Authors: Boai Sun, Wenjin Guo, and Zongmin Yu; published on ArXiv on 2026-06-07.
- 02The agent uses large language models and iterative code generation to build controllers, not neural-network weight adjustment.
- 03Candidate strategies are deployed into physical simulations; the agent diagnoses behaviors from multimodal evidence and refines source code progressively.
Boai Sun, Wenjin Guo, and Zongmin Yu present a self-evolving scientific-agent framework that addresses a core tension in physical-system control: deep reinforcement learning can optimize complex policies but sacrifices interpretability, while scientific discovery demands a traceable chain of reasoning from physical evidence to structured control architectures. Their agent, driven by large language models and iterative code generation, sidesteps weight adjustment entirely — instead deploying candidate strategies into physical simulations, diagnosing emergent dynamic behaviors from multimodal evidence, and converting those observations into progressive source-code refinements. An auditable evolve log records the full discovery process, making the agent's reasoning transparent and reproducible.
The framework is validated on a highly nonlinear fluid-structure interaction problem: an underactuated, two-joint dogfish swimmer that must reach spatial targets using only joint angular accelerations.
The framework is validated on a highly nonlinear fluid-structure interaction problem: an underactuated, two-joint dogfish swimmer that must reach spatial targets using only joint angular accelerations. Beginning from a propulsive seed policy that exhibits a one-sided steering bias, the agent autonomously discovers and refines a unified controller. The resulting control architecture incorporates traveling-wave propulsion, body-frame target guidance, yaw-rate feedback, signed mean-tail curvature, and adaptive cadence relief. Remarkably, the synthesized policy generalizes to unseen static targets and dynamically curved pursuit trajectories without any retraining or target-specific branching, demonstrating that the agent can transform accumulated physical evidence into a robust, mathematically readable control policy through a fully traceable scientific discovery process.
Key facts
- 01Authors: Boai Sun, Wenjin Guo, and Zongmin Yu; published on ArXiv on 2026-06-07.
- 02The agent uses large language models and iterative code generation to build controllers, not neural-network weight adjustment.
- 03Candidate strategies are deployed into physical simulations; the agent diagnoses behaviors from multimodal evidence and refines source code progressively.
- 04The test case is an underactuated, two-joint dogfish swimmer reaching spatial targets using only joint angular accelerations.
- 05Starting from a seed policy with a one-sided steering bias, the agent discovers a unified controller covering all canonical targets.
- 06The synthesized policy generalizes to unseen static targets and dynamically curved pursuit trajectories without retraining or target-specific branching.
- 07The emergent control architecture includes traveling-wave propulsion, body-frame target guidance, yaw-rate feedback, signed mean-tail curvature, and adaptive cadence relief.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →