Apr 19, 2026·1 min readAgentic Coding

30 days of MCP in production: lessons from the field

After 30 days running MCP servers in production, Atlas Whoff shares hard-won lessons on tool descriptions, schema efficiency, statelessness, error messaging, and naming conventions that make or break Claude-powered automations at scale.

Dev.to #mcp·Atlas Whoff

Read at source

Composite

6.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Atlas Whoff (whoffagents.com) spent 30 days running Model Context Protocol servers in production and documented what actually fails in the real world. The biggest surprises: Claude routes tool calls based entirely on description strings, not implementation logic — and vague descriptions caused three days of debugging. Schema bloat is also a silent cost driver: at 1,000 calls/day, a 400-token schema adds 400,000 tokens of overhead daily. Other critical findings include writing error messages as Claude-readable prompts, enforcing strict per-call statelessness to avoid cross-session data bleed, and treating tool names as permanent — since renaming breaks existing prompts and agent workflows.

01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.

Summary— our read of the original

Atlas Whoff, writing on Dev.to, published a post-mortem after 30 days running Model Context Protocol (MCP) servers in production for whoffagents.com automations. MCP is Anthropic's standard for giving Claude persistent, shareable tools across conversations and applications — described by Whoff as "a USB standard for AI capabilities." The post focuses on six painful lessons learned the hard way, starting with tool descriptions: Claude decides which tool to call based solely on the description string, not the underlying code. Whoff spent three days debugging incorrect tool routing before realizing the fix was rewriting descriptions to be explicit, include usage examples, and specify when *not* to use a given tool — treating them like API docs written for a junior developer.\n\nSchema size and error message design emerged as two other high-leverage areas. Every tool call injects the full JSON schema into context, so a 120-token schema versus a 35-token one translates to hundreds of thousands of extra tokens per day at scale. On errors: Claude reads the error message and decides its next action, meaning vague errors like `"Database error"` cause random retries, while structured messages — including wait times and fallback tool suggestions — reduced retry loops by ~60% in Whoff's testing. A concurrency bug also revealed that in-memory server state is shared across concurrent Claude sessions in production, making full per-call statelessness non-negotiable.\n\nWhoff rounds out the post with two operational practices: logging every tool call with its inputs and timestamp (which caught Claude mistakenly calling `delete_item` instead of `archive_item` due to similar descriptions), and investing time upfront in tool naming. He recommends explicit verb-noun pairs like `get_invoice_by_id` and `mark_task_complete`, and warns against abbreviations or generic names like `getData` or `process`. Tool names, he notes, are effectively permanent in production because renaming them breaks existing prompts and agent workflows.

Key facts

01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.
04Structured error messages (including wait times and fallback tool names) reduced retry loops by ~60% compared to generic errors.
05In-memory server state is shared across concurrent Claude sessions in production — every tool call must be fully stateless, reading and writing only to a database.
06Logging every tool input caught Claude calling `delete_item` instead of `archive_item` due to overly similar descriptions, before it reached production data.
07Tool names should be treated as permanent: renaming breaks existing prompts and agent workflows, so verb-noun pairs like `get_invoice_by_id` should be chosen carefully upfront.

Topics

#mcp #tool-use #production-patterns #debugging #developer-tools

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 19, 2026 · 11:33 UTC. How this works →

Apr 19, 2026·1 min readAgentic Coding

30 days of MCP in production: lessons from the field

Dev.to #mcp·Atlas Whoff

Read at source

Composite

6.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.

Summary— our read of the original

Key facts

01Claude routes tool calls based on description strings alone — not implementation logic — making precise, example-rich descriptions critical.
02Schema tokens are injected on every tool call; at 1,000 calls/day, a 400-token schema generates 400,000 tokens of overhead daily.
03Trimming a schema from 120 tokens to 35 tokens is achievable without losing Claude's comprehension, according to Whoff's testing.
04Structured error messages (including wait times and fallback tool names) reduced retry loops by ~60% compared to generic errors.
05In-memory server state is shared across concurrent Claude sessions in production — every tool call must be fully stateless, reading and writing only to a database.
06Logging every tool input caught Claude calling `delete_item` instead of `archive_item` due to overly similar descriptions, before it reached production data.
07Tool names should be treated as permanent: renaming breaks existing prompts and agent workflows, so verb-noun pairs like `get_invoice_by_id` should be chosen carefully upfront.

Topics

#mcp #tool-use #production-patterns #debugging #developer-tools

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics