Proxy layer cuts MCP tool-schema context from 14K to ~3.5K tokens
u/ArtSelect137 built a lightweight proxy that classifies user intent and exposes only the relevant domain subset of tools, cutting MCP tool-schema token usage by roughly 70% and improving first-try tool selection accuracy.
Score breakdown
The pattern reduces per-request tool-schema overhead by roughly 75% and narrows the model's tool-selection search space from 35 options to 5–8, addressing two concrete costs — token burn and selection accuracy — that grow with MCP server size.
- 0135-tool MCP server was sending ~14K tokens of tool schemas on every request (~400 tokens per schema)
- 02A proxy layer classifies user intent into five domains: search, document, code, data, and system
- 03Only the tools in the matched domain group (typically 5–8) are exposed to the model
u/ArtSelect137 described a proxy pattern for large MCP servers where all tool schemas are sent to the model on every request. With 35 tools at roughly 400 tokens per schema, that amounts to 14K tokens of tool definitions before any user content is processed — and the model must also pick from 35 options rather than a focused subset, which the post notes hurts selection accuracy.
Only the tools belonging to the matched domain group are forwarded to the model, typically 5–8 tools instead of 35.
The solution is a proxy layer that sits in front of the MCP server, inspects the user's first message, and classifies intent into one of five domains: search, document, code, data, and system. Only the tools belonging to the matched domain group are forwarded to the model, typically 5–8 tools instead of 35. A fallback group handles cases where intent is unclear. After a week of testing, average tool-schema payload fell from 14K to ~3.5K tokens, tool selection accuracy improved noticeably, and no functional regressions were reported.
The post notes this approach mirrors what the "Rust toggle+act" pattern does, but implemented at the proxy level so it is compatible with any MCP client without requiring changes to the underlying server.
Key facts
- 0135-tool MCP server was sending ~14K tokens of tool schemas on every request (~400 tokens per schema)
- 02A proxy layer classifies user intent into five domains: search, document, code, data, and system
- 03Only the tools in the matched domain group (typically 5–8) are exposed to the model
- 04A fallback group handles unclear or ambiguous intents
- 05Average tool-schema payload dropped from 14K to ~3.5K tokens after one week
- 06Tool selection accuracy improved — model picks the correct tool on first try more often
- 07The proxy works with any MCP client without requiring changes to the underlying server
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →