Lean4Agent brings formal verification to LLM agent workflows
Lean4Agent is a new framework that uses Lean4, a dependent-type formal language, to model, verify, and debug LLM agent workflows and execution trajectories, improving SWE-Bench performance by measurable margins.
Score breakdown
Lean4Agent introduces formal verification — previously absent from most agent systems — as a mechanism for specifying, debugging, and improving LLM agent workflows, with measured performance gains on established benchmarks.
- 01Lean4Agent is described by its authors as the first framework to use Lean4, a dependent-type formal language, to model and verify LLM agent workflows.
- 02FormalAgentLib is an extensible Lean4 library for checking semantic consistency of agent workflows and localizing execution-time failures from trajectories.
- 03LeanEvolve uses FormalAgentLib's verification results to revise agent workflows and enhance their capability.
Ruida Wang, Jerry Huang, and Pengcheng Wang introduce Lean4Agent, which they describe as the first framework to use Lean4 — a dependent-type formal language — to formally model and verify the behavior of LLM-based agents. The motivation draws an analogy to mathematics, where the ambiguity of natural language drove the development of formal languages; the authors apply the same logic to agent workflows, where the absence of formal specification methods makes it difficult to verify correctness or debug failures.
LeanEvolve builds on FormalAgentLib by taking its verification results and using them to revise workflows, aiming to improve overall agent capability.
The framework has two main components. FormalAgentLib is an extensible Lean4 library that checks the semantic consistency of agent workflows under explicit assumptions and enables localization of execution-time failures revealed by agent trajectories. LeanEvolve builds on FormalAgentLib by taking its verification results and using them to revise workflows, aiming to improve overall agent capability.
Experiments conducted on a hard problem subset of SWE-Bench-Verified and a subset of ELAIP-Bench, spanning 5 leading LLMs, show that workflows passing formal verification outperform failing ones by an average of 11.94%. LeanEvolve's workflow revision step yields an additional average improvement of 7.47% on SWE performance. The authors position Lean4Agent as a foundation for a new research direction applying expressive dependent-type formal languages to agent behavior specification and verification.
Key facts
- 01Lean4Agent is described by its authors as the first framework to use Lean4, a dependent-type formal language, to model and verify LLM agent workflows.
- 02FormalAgentLib is an extensible Lean4 library for checking semantic consistency of agent workflows and localizing execution-time failures from trajectories.
- 03LeanEvolve uses FormalAgentLib's verification results to revise agent workflows and enhance their capability.
- 04Experiments span a hard subset of SWE-Bench-Verified and a subset of ELAIP-Bench across 5 leading LLMs.
- 05Verification-passing workflows outperform failing ones by an average of 11.94%.
- 06LeanEvolve further improves SWE performance by an average of 7.47%.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →