MCP server doubles as a QA engineer for production data
Eugen's team built 118 MCP tools for their SaaS, ran 722 manual acceptance test scenarios through Claude, and discovered that an AI with read access across domains surfaces data inconsistencies, cross-entity bugs, and security vulnerabilities that unit tests never catch.
Score breakdown
Treat your MCP tools as raw public API endpoints — audit them with cross-domain queries and explicit ownership checks, because implicit web UI security and native-type test suites will not catch transport-layer bugs or IDOR vulnerabilities that Claude exposes in production.
- 01The team built 118 MCP tools organized into 19 files for their SaaS product.
- 02722 acceptance test scenarios were run through Claude, with results categorized as PASS, SKIP, BUG-FIXED, or KNOWN-LIMITATION.
- 03Six list tools used `z.number()` for pagination params; Claude sends these as strings over JSON-RPC, causing Zod rejections — fixed with `z.coerce.number()`.
Eugen's team built 118 MCP tools across 19 files for their SaaS application — primarily for data-entry operations like recording expenses and creating invoices. After running 722 manual acceptance test scenarios through Claude, they concluded that an AI with read access to multiple domains functions as a production data auditor, not just a code executor. Cross-domain queries like "which invoices have been in Sent status for more than 30 days?" or "compare total invoiced revenue with total recorded income in accounting" exposed inconsistencies that live between systems, where no single unit test has jurisdiction. Both systems could pass their own tests while a silent sync failure or missed transaction created a real discrepancy.
The structured test run — categorizing results as PASS, SKIP, BUG-FIXED, or KNOWN-LIMITATION — surfaced several concrete bugs.
The structured test run — categorizing results as PASS, SKIP, BUG-FIXED, or KNOWN-LIMITATION — surfaced several concrete bugs. Six list tools used `z.number()` for pagination parameters, but Claude sends numbers as strings over JSON-RPC, causing Zod to reject them; the fix was switching to `z.coerce.number()`. Three update tools (`update-company`, `update-client`, `update-product`) had `name: z.string()` as a required field, breaking partial updates where only a phone number needed changing; the fix was making every non-mandatory field `.optional()`. Two IDOR vulnerabilities were found in `update-document-status` and `delete-document-status`, which lacked `teamId` checks in their repository queries — the web UI never exposed this gap because it only showed the user's own data, but the raw MCP API had no such guard. Runtime crashes were also caught: `update-invoice` called `service.update()` when the actual method was `service.updateDocument()`, a mismatch TypeScript missed due to `as never` bypassing a complex generic, and `convert-estimate-to-invoice` used `require()` in an ESM context, hiding a type mismatch until runtime.
The post's central argument is that MCP tools are public API endpoints, and any implicit security baked into a web UI must be made explicit in every MCP tool query. The acceptance test process treated Claude as both a QA auditor for production data and a real-world transport-layer client, catching bugs that only appear when code runs through the actual JSON-RPC path rather than a native test suite.
Key facts
- 01The team built 118 MCP tools organized into 19 files for their SaaS product.
- 02722 acceptance test scenarios were run through Claude, with results categorized as PASS, SKIP, BUG-FIXED, or KNOWN-LIMITATION.
- 03Six list tools used `z.number()` for pagination params; Claude sends these as strings over JSON-RPC, causing Zod rejections — fixed with `z.coerce.number()`.