Code-Augur pairs LLM agents with fuzzing to expose hidden code vulnerabilities
Code-Augur is a new agentic vulnerability detection system that makes an LLM agent's security assumptions explicit as in-source assertions, then uses a guided fuzzer to continuously test and refine those assumptions.
Score breakdown
By forcing LLM agents to commit their security assumptions as falsifiable assertions and immediately stress-testing them with a fuzzer, Code-Augur replaces opaque agent reasoning with a verifiable, self-correcting audit loop — directly addressing the missed-vulnerability risk the paper identifies as the central weakness of current agentic security analysis.
- 01Code-Augur is authored by Zhengxiong Luo, Mehtab Zafar, and Dylan Wolff, published on ArXiv on 2026-06-17.
- 02The system introduces a security-specification-first paradigm that exposes an LLM agent's tacit assumptions as explicit in-source assertions.
- 03A guided fuzzer runs in parallel to attempt to falsify those assertions at runtime, either revealing a real vulnerability or a flawed specification.
Zhengxiong Luo, Mehtab Zafar, and Dylan Wolff present Code-Augur, a system built around a "security-specification-first" paradigm designed to make agentic vulnerability detection more transparent and reliable. The core problem the paper addresses is that autonomous LLM agents conducting security audits produce reasoning that is opaque — when an agent declares a function secure, the assumptions behind that judgment are implicit and unverified. Incorrect assumptions can mask real vulnerabilities and erode trust in agentic analysis.
Code-Augur tackles this by requiring the agent to externalize its tacit assumptions as explicit security specifications, committed directly into the source as assertions whenever a component is deemed secure.
Code-Augur tackles this by requiring the agent to externalize its tacit assumptions as explicit security specifications, committed directly into the source as assertions whenever a component is deemed secure. In parallel, a guided fuzzer continuously attempts to falsify those assertions at runtime. When the fuzzer triggers an assertion, the outcome is either the discovery of a genuine vulnerability or the identification of a flawed specification that must be refined — in both cases grounding the agent's model of code behavior against how the code actually executes.
On real-world subjects, Code-Augur outperformed other state-of-the-art agents in vulnerability detection and uncovered 22 new vulnerabilities in key open-source projects. The paper also highlights that Code-Augur achieves this using widely available LLMs such as Sonnet and DeepSeek, contrasting favorably with curated specialized models like Claude Mythos.
Key facts
- 01Code-Augur is authored by Zhengxiong Luo, Mehtab Zafar, and Dylan Wolff, published on ArXiv on 2026-06-17.
- 02The system introduces a security-specification-first paradigm that exposes an LLM agent's tacit assumptions as explicit in-source assertions.
- 03A guided fuzzer runs in parallel to attempt to falsify those assertions at runtime, either revealing a real vulnerability or a flawed specification.
- 04When the fuzzer triggers an assertion, the agent refines its specification, aligning its understanding of code intent with actual runtime behavior.
- 05Code-Augur detected more vulnerabilities than other state-of-the-art agents on real-world subjects.
- 06Code-Augur found 22 new vulnerabilities in key open-source projects.
- 07Code-Augur runs on widely available LLMs like Sonnet and DeepSeek, rather than requiring curated specialized models like Claude Mythos.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →