LLM security audit of 124-file Python repo costs $0.90
SystAgProject ran a full LLM-powered security audit on their own 124-file Python codebase using Claude Opus 4.7 with prompt caching, completing in 22 seconds for $0.90 and surfacing one real bug fixed the same day.
Score breakdown
SystAgProject audited their own 124-file Python codebase using Claude Opus 4.7 with prompt caching across 4 batches, finishing in 22 seconds at a total cost of $0.90. The scan produced 0 critical findings, 1 high, and 2 medium issues — one of which was a real `subprocess.run` memory bug fixed within the hour. The other two findings flagged a plain-text OAuth refresh token stored on disk and a prompt injection surface in an LLM-backed email classifier. The audit was a smoke test for their product VibeScan, a $49 PDF security report aimed at apps built with AI coding tools like Lovable, Bolt, Cursor, Replit, and v0.
SystAgProject built and dog-fooded a product called VibeScan — a $49 automated PDF security audit targeting codebases produced by AI coding tools such as Lovable, Bolt, Cursor, Replit, and v0. Before selling it, they ran it against their own 124-file Python repo using Claude Opus 4.7 with prompt caching across 4 LLM batches. Total wall time was 22 seconds; total cost was $0.90, consuming 176,364 input tokens and 779 output tokens. The report returned 0 critical findings, 1 high, and 2 medium severity issues.
The high-severity finding was a `subprocess.run` call using `capture_output=True`, which holds the entire stdout/stderr of a subprocess in memory until the child process exits.
The high-severity finding was a `subprocess.run` call using `capture_output=True`, which holds the entire stdout/stderr of a subprocess in memory until the child process exits. In a scheduler that could spawn web scrapers or large ETL jobs, a single bad run could write hundreds of MB into a SQLite ledger and exhaust process RAM. The fix — capping each stream at 50 KB before returning — took four minutes. The first medium finding flagged a Gmail OAuth refresh token stored as a plain JSON file on disk with no encryption, recommending at minimum `0600` file permissions and `.gitignore` coverage, and ideally OS keyring or encrypted env var storage. The second medium finding identified a prompt injection surface: an `extract_plain_body` function that strips HTML with a regex before feeding email text into an LLM classifier, leaving hidden content in style tags or HTML comments potentially readable as injected instructions. The recommended fix is a swap to BeautifulSoup. The author notes that equivalent consultant coverage for a 124-file repo would cost $600–$1,500 and take 3–5 hours, and explicitly lists what VibeScan does not cover: client-side trust issues, concurrency bugs, CVE database cross-referencing, and production infrastructure.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content.