Fewer MCP tools means better AI agent reliability
Alex Standiford shares how cutting his MCP server from 30–40 tools down to three dramatically improved agent reliability, arguing that a large tool surface overwhelms models the same way a cluttered UI overwhelms new users.
Score breakdown
Developers building MCP servers should design around a small number of parameterized verbs rather than mirroring their REST API surface, as tool count directly degrades model reliability and inflates token costs.
- 01Beacon initially shipped with 30–40 tools, one per REST endpoint, causing constant hallucination and wrong tool selection.
- 02MCP tool descriptions are re-read by the model every turn, making a large tool surface expensive in tokens and error-prone.
- 03Standiford rebuilt Beacon down to three tools (or four, depending on how you count).
Alex Standiford describes building the first version of Beacon, the MCP server for his product Siren, in a single evening and shipping somewhere between 30 and 40 tools — one per REST endpoint, covering every CRUD operation, specialized lookup, and listing variation. The result was an agent that hallucinated constantly, confidently picked the wrong tool, and called tools with parameters that didn't match anything real. The core mistake, he explains, was treating an MCP tool surface like a REST API. Unlike a REST API consumed by developers who can hold a data model in their head, an MCP tool list is re-read by the model on every single turn. More tools means more tokens burned on descriptions, more near-duplicate options, and more chances for the model to grab the almost-right one — an experience he compares to opening Blender for the first time as a CAD user and being overwhelmed by a wall of unfamiliar, subtly different controls.
After researching production MCP servers, Standiford found a consistent pattern: the ones people actually use have shockingly small tool counts.
After researching production MCP servers, Standiford found a consistent pattern: the ones people actually use have shockingly small tool counts. He rebuilt Beacon down to three tools (or four, depending on how you count) by applying aggressive parameterization. Instead of six search tools for each content type, he uses one search tool with a `contentType` parameter. Instead of one fetch tool per resource type, he uses one fetch tool that accepts a list of IDs. He also draws a parallel to GraphQL: by letting the agent specify which fields it wants in the response, request and response cycles become token-efficient, which saves money and improves reliability in long conversations where context window pressure compounds.
His decision rule for new tools is now: "Could I get the same behavior by adding a parameter to an existing tool?" If yes, he adds the parameter. He argues that large tool surfaces emerge because each individual tool seems locally justified — different parameters, different response shapes, read vs. write — but the cumulative shape of the menu is something no one would have designed intentionally from a whiteboard. His recommended mental model: pick the verbs first, fit everything through them, and only split when splitting is genuinely unavoidable.
Key facts
- 01Beacon initially shipped with 30–40 tools, one per REST endpoint, causing constant hallucination and wrong tool selection.
- 02MCP tool descriptions are re-read by the model every turn, making a large tool surface expensive in tokens and error-prone.
- 03Standiford rebuilt Beacon down to three tools (or four, depending on how you count).