30 days of MCP in production: lessons from the field
After 30 days running MCP servers in production, Atlas Whoff shares hard-won lessons on tool descriptions, schema efficiency, statelessness, error messaging, and naming conventions that make or break Claude-powered automations at scale.
Score breakdown
Atlas Whoff (whoffagents.com) spent 30 days running Model Context Protocol servers in production and documented what actually fails in the real world. The biggest surprises: Claude routes tool calls based entirely on description strings, not implementation logic — and vague descriptions caused three days of debugging. Schema bloat is also a silent cost driver: at 1,000 calls/day, a 400-token schema adds 400,000 tokens of overhead daily. Other critical findings include writing error messages as Claude-readable prompts, enforcing strict per-call statelessness to avoid cross-session data bleed, and treating tool names as permanent — since renaming breaks existing prompts and agent workflows.
- 01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
- 02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
- 03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.
Atlas Whoff, writing on Dev.to, published a post-mortem after 30 days running Model Context Protocol (MCP) servers in production for whoffagents.com automations. MCP is Anthropic's standard for giving Claude persistent, shareable tools across conversations and applications — described by Whoff as "a USB standard for AI capabilities." The post focuses on six painful lessons learned the hard way, starting with tool descriptions: Claude decides which tool to call based solely on the description string, not the underlying code. Whoff spent three days debugging incorrect tool routing before realizing the fix was rewriting descriptions to be explicit, include usage examples, and specify when *not* to use a given tool — treating them like API docs written for a junior developer.\n\nSchema size and error message design emerged as two other high-leverage areas. Every tool call injects the full JSON schema into context, so a 120-token schema versus a 35-token one translates to hundreds of thousands of extra tokens per day at scale. On errors: Claude reads the error message and decides its next action, meaning vague errors like `"Database error"` cause random retries, while structured messages — including wait times and fallback tool suggestions — reduced retry loops by ~60% in Whoff's testing. A concurrency bug also revealed that in-memory server state is shared across concurrent Claude sessions in production, making full per-call statelessness non-negotiable.\n\nWhoff rounds out the post with two operational practices: logging every tool call with its inputs and timestamp (which caught Claude mistakenly calling `delete_item` instead of `archive_item` due to similar descriptions), and investing time upfront in tool naming. He recommends explicit verb-noun pairs like `get_invoice_by_id` and `mark_task_complete`, and warns against abbreviations or generic names like `getData` or `process`. Tool names, he notes, are effectively permanent in production because renaming them breaks existing prompts and agent workflows.
Key facts
- 01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
- 02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
- 03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.
- 04Structured error messages (including wait times and fallback tool names) reduced retry loops by ~60% compared to generic errors.
- 05In-memory server state is shared across concurrent Claude sessions in production — every tool call must be fully stateless, reading and writing only to a database.