Apr 19, 2026·1 min readApplications & Use Cases

LLM security audit of 124-file Python repo costs $0.90

SystAgProject ran a full LLM-powered security audit on their own 124-file Python codebase using Claude Opus 4.7 with prompt caching, completing in 22 seconds for $0.90 and surfacing one real bug fixed the same day.

Dev.to #ai·SystAgProject

Read at source

Composite

5.0

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

SystAgProject audited their own 124-file Python codebase using Claude Opus 4.7 with prompt caching across 4 batches, finishing in 22 seconds at a total cost of $0.90. The scan produced 0 critical findings, 1 high, and 2 medium issues — one of which was a real `subprocess.run` memory bug fixed within the hour. The other two findings flagged a plain-text OAuth refresh token stored on disk and a prompt injection surface in an LLM-backed email classifier. The audit was a smoke test for their product VibeScan, a $49 PDF security report aimed at apps built with AI coding tools like Lovable, Bolt, Cursor, Replit, and v0.

Summary— our read of the original

SystAgProject built and dog-fooded a product called VibeScan — a $49 automated PDF security audit targeting codebases produced by AI coding tools such as Lovable, Bolt, Cursor, Replit, and v0. Before selling it, they ran it against their own 124-file Python repo using Claude Opus 4.7 with prompt caching across 4 LLM batches. Total wall time was 22 seconds; total cost was $0.90, consuming 176,364 input tokens and 779 output tokens. The report returned 0 critical findings, 1 high, and 2 medium severity issues.

The high-severity finding was a `subprocess.run` call using `capture_output=True`, which holds the entire stdout/stderr of a subprocess in memory until the child process exits.

The high-severity finding was a `subprocess.run` call using `capture_output=True`, which holds the entire stdout/stderr of a subprocess in memory until the child process exits. In a scheduler that could spawn web scrapers or large ETL jobs, a single bad run could write hundreds of MB into a SQLite ledger and exhaust process RAM. The fix — capping each stream at 50 KB before returning — took four minutes. The first medium finding flagged a Gmail OAuth refresh token stored as a plain JSON file on disk with no encryption, recommending at minimum `0600` file permissions and `.gitignore` coverage, and ideally OS keyring or encrypted env var storage. The second medium finding identified a prompt injection surface: an `extract_plain_body` function that strips HTML with a regex before feeding email text into an LLM classifier, leaving hidden content in style tags or HTML comments potentially readable as injected instructions. The recommended fix is a swap to BeautifulSoup. The author notes that equivalent consultant coverage for a 124-file repo would cost $600–$1,500 and take 3–5 hours, and explicitly lists what VibeScan does not cover: client-side trust issues, concurrency bugs, CVE database cross-referencing, and production infrastructure.

Topics

#code-generation #llm-audit #security #cost-efficiency #developer-tools

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 19, 2026 · 22:16 UTC. How this works →

Apr 19, 2026·1 min readApplications & Use Cases

LLM security audit of 124-file Python repo costs $0.90

Dev.to #ai·SystAgProject

Read at source

Composite

5.0

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Summary— our read of the original

The high-severity finding was a `subprocess.run` call using `capture_output=True`, which holds the entire stdout/stderr of a subprocess in memory until the child process exits.

Topics

#code-generation #llm-audit #security #cost-efficiency #developer-tools

Methodology

Score breakdown

Topics

Score breakdown

Topics