Running 27 MCP tools in production: naming grammar beats low count
u/Specialist_Cow24 runs 27 MCP tools in production for edgar.tools and argues that selection accuracy degrades with ambiguity, not tool count — and shares six concrete design patterns that keep routing reliable at scale.
Score breakdown
The post provides production evidence that the widely cited ~15-tool MCP limit is a proxy for ambiguity rather than a hard count ceiling, and demonstrates that naming grammar, description-level routing instructions, and selection-focused evals can keep a 27-tool server accurate.
- 01The edgar.tools MCP server runs 27 tools in production since February, with traffic from Claude Desktop, Claude.ai, and Cursor.
- 02The post argues selection accuracy degrades with ambiguity, not tool count — 8 ambiguous tools can route worse than 27 well-defined ones.
- 03A consolidated `search_entities` tool drew 15 calls from 6 users vs. 173 calls from 37 users for the concrete `search_companies` in the same window.
u/Specialist_Cow24 maintains the MCP server for edgar.tools (SEC filing data, backed by the `edgartools` Python library) and has operated 27 tools in production since February, serving real traffic from Claude Desktop, Claude.ai, and Cursor. The post challenges the widely repeated "~15 tools max" rule, arguing that selection accuracy degrades with ambiguity rather than raw count. The author presents six design principles derived from production telemetry and eval failures.
Second, descriptions function as routing instructions, and the most valuable part is the negative space — explicit redirects to sibling tools.
First, tool names should form a grammar: parallel naming within a category (e.g., `search_companies` / `search_funds` / `search_advisers`, `fund_profile` / `adviser_profile`) lets the model predict the third name once it has seen two. The post cites a concrete experiment: consolidating three search tools into a single `search_entities` tool with a type parameter drew 15 calls from 6 users in a given window, while the concrete `search_companies` drew 173 calls from 37 users — models match the user's noun, not a type hierarchy. Second, descriptions function as routing instructions, and the most valuable part is the negative space — explicit redirects to sibling tools. A single sentence added to the financial-statement tools warning that XBRL data is unavailable on earnings day and redirecting "latest earnings" intent to the 8-K press release path eliminated an entire class of hallucinations that no schema change could fix.
Additional principles include grouping tools into category-based noun-spaces (research, signals, monitoring, funds, advisers, aggregation) with a clear test for whether a new tool belongs; using aggregation tools like `peer_facts` and `portfolio_events` to encode repeated multi-step sequences observed in telemetry; leveraging server-defined `prompts/list` templates (e.g., `/edgar:filing_red_flags`, `/edgar:earnings_postmortem`) to teach clients six-tool workflows that surface natively in Claude Desktop and Cursor; and running an eval suite that scores tool selection directly, treating misroutes as description bugs to be fixed and re-evaluated.
Key facts
- 01The edgar.tools MCP server runs 27 tools in production since February, with traffic from Claude Desktop, Claude.ai, and Cursor.
- 02The post argues selection accuracy degrades with ambiguity, not tool count — 8 ambiguous tools can route worse than 27 well-defined ones.
- 03A consolidated `search_entities` tool drew 15 calls from 6 users vs. 173 calls from 37 users for the concrete `search_companies` in the same window.
- 04Descriptions should include explicit negative redirects to sibling tools — one sentence redirecting 'latest earnings' queries eliminated a hallucination class no schema change could fix.
- 05Aggregation tools (`peer_facts`, `portfolio_events`) were added after telemetry showed models repeatedly hand-orchestrating the same 3–4 call sequences.
- 06Server-defined `prompts/list` templates teach clients multi-tool workflows and surface natively in Claude Desktop and Cursor.
- 07An eval suite scores tool selection — not just answer quality — and treats misroutes as description bugs to fix and re-run.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 11, 2026 · 08:34 UTC. How this works →