Token overhead in MCP tool calls traced across 400 calls
u/LorenzoNardi logged token usage across 400 MCP tool calls on a document-processing server and found that tool definition schemas, retry sequences, and verbose outputs are the dominant cost drivers.
Score breakdown
The analysis surfaces retry sequences and tool-definition schema bloat as significant but non-obvious token cost drivers in MCP deployments, with concrete measurements showing retries cost 2.8x a clean call and schema overhead can reach ~10k tokens before any real work begins.
- 01Analyzed 400 MCP tool calls across 3 tool types on a document-processing server
- 02Tool definition schemas cost ~800–1,200 tokens per call in system context before any data is passed
- 0310 tools loaded simultaneously can consume ~10k tokens before the first real action
u/LorenzoNardi began logging token counts per MCP call after costs on a document-processing server climbed faster than expected. Examining 400 calls across three tool types, the analysis broke overhead into four categories: tool definition schemas, input payload size, output verbosity, and error-handling retry paths.
A single well-documented tool with full parameter descriptions consumes ~800–1,200 tokens per call in system context before any user data is sent.
The most surprising finding was tool definition overhead. A single well-documented tool with full parameter descriptions consumes ~800–1,200 tokens per call in system context before any user data is sent. With 10 tools loaded, that baseline reaches ~10k tokens before the first real action. Retry sequences were the second major surprise: ~18% of calls involved a retry, and those sequences cost 2.8x a clean first-attempt call on average because the agent pays for the original call, the error message, the reformulated call, and the retry itself.
Output verbosity was identified as the easiest lever to pull — returning only requested fields instead of raw JSON dumps reduced output tokens by ~60% on average with no degradation in agent performance. Recommended mitigations include trimming tool descriptions to the minimum needed for correct tool selection, adding a pre-return validation layer to eliminate malformed outputs (and thus most retries), and grouping read and write tools into separate servers to keep the selection context smaller per task type.
Key facts
- 01Analyzed 400 MCP tool calls across 3 tool types on a document-processing server
- 02Tool definition schemas cost ~800–1,200 tokens per call in system context before any data is passed
- 0310 tools loaded simultaneously can consume ~10k tokens before the first real action
- 04~18% of calls were retry sequences, costing 2.8x a clean first-attempt call on average
- 05Returning only requested fields vs. full JSON dumps cut output tokens by ~60% on average
- 06Retry overhead is easy to miss when only looking at average cost per call
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →