Apr 21, 2026·1 min readAgentic Coding

Code Mode cuts MCP tool-call token costs by 50–93%

Kuldeep Paul explains how Bifrost's Code Mode — where an agent writes an orchestration script instead of receiving full tool definitions in context — reduces MCP token costs by 50% to 93% depending on the number of connected servers and tools.

Dev.to #llm·Kuldeep Paul

Read at source

Composite

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Teams running production AI agents with many MCP servers can cut token costs by over 50% — and up to 93% at scale — by switching to Code Mode without sacrificing task accuracy.

01Standard MCP clients load every tool definition from every connected server into the model's context on every request, regardless of whether those tools are used.
02Intermediate results in multi-hop workflows (e.g., a meeting transcript passed between Google Drive and Salesforce) flow through the model's context multiple times — roughly 50,000 extra tokens per run in the cited example.
03Anthropic demonstrated a 98.7% token reduction on the Drive-to-Salesforce scenario, cutting consumption from 150,000 to 2,000 tokens.

Summary— our read of the original

Kuldeep Paul's article identifies two root causes of runaway MCP token costs at production scale. First, context-window bloat: standard MCP clients load every tool definition — descriptions, parameter schemas, and return-type metadata — from every connected server into the prompt on every single request. An agent wired to hundreds of tools burns through tokens before the user's message is even processed. Second, intermediate result shuttling: in multi-hop workflows, data passes through the model's context between each tool call. The canonical example given is a Google Drive-to-Salesforce workflow where a meeting transcript flows through context twice, consuming roughly 50,000 extra tokens on a single run.

Code Mode resolves both problems by inverting the execution pattern.

Code Mode resolves both problems by inverting the execution pattern. Instead of receiving a full tool catalog, the model writes a short orchestration script (in Python or a sandboxed Starlark variant) and the gateway executes it, keeping intermediate results inside the sandbox. Bifrost's implementation exposes four meta-tools — `listToolFiles`, `readToolFile`, `getToolDocs`, and `executeToolCode` — so the model can discover and inspect only the interfaces it actually needs. Anthropic and Cloudflare independently validated the same core approach: Cloudflare compressed 2,500+ API endpoints to two tools and around 1,000 tokens of surface area, while Anthropic demonstrated a 98.7% token reduction on the Drive-to-Salesforce scenario, cutting consumption from 150,000 to 2,000 tokens.

Bifrost's own benchmarks, run as 64 identical queries with Code Mode on versus off, show the savings compound sharply with scale: a 58% token drop at 96 tools across 6 servers, 84% at 251 tools across 11 servers, and 93% at 508 tools across 16 servers. Critically, all three configurations maintained a 100% task pass rate, meaning no accuracy was sacrificed for the cost reduction.

Key facts

01Standard MCP clients load every tool definition from every connected server into the model's context on every request, regardless of whether those tools are used.
02Intermediate results in multi-hop workflows (e.g., a meeting transcript passed between Google Drive and Salesforce) flow through the model's context multiple times — roughly 50,000 extra tokens per run in the cited example.
03Anthropic demonstrated a 98.7% token reduction on the Drive-to-Salesforce scenario, cutting consumption from 150,000 to 2,000 tokens.
04Cloudflare compressed 2,500+ API endpoints down to two tools and around 1,000 tokens of surface area using the same pattern.
05Bifrost's Code Mode exposes four meta-tools — `listToolFiles`, `readToolFile`, `getToolDocs`, and `executeToolCode` — so the model pulls only the interfaces it needs.
06Bifrost benchmarks (64 queries each) showed token drops of 58% at 96 tools, 84% at 251 tools, and 93% at 508 tools.
07All benchmark configurations maintained a 100% task pass rate, meaning no accuracy was traded for cost savings.

Topics

#mcp #code-generation #tool-use #cost-optimization #agent-framework

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 21, 2026 · 18:16 UTC. How this works →

Apr 21, 2026·1 min readAgentic Coding

Code Mode cuts MCP tool-call token costs by 50–93%

Dev.to #llm·Kuldeep Paul

Read at source

Composite

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Teams running production AI agents with many MCP servers can cut token costs by over 50% — and up to 93% at scale — by switching to Code Mode without sacrificing task accuracy.

01Standard MCP clients load every tool definition from every connected server into the model's context on every request, regardless of whether those tools are used.
02Intermediate results in multi-hop workflows (e.g., a meeting transcript passed between Google Drive and Salesforce) flow through the model's context multiple times — roughly 50,000 extra tokens per run in the cited example.
03Anthropic demonstrated a 98.7% token reduction on the Drive-to-Salesforce scenario, cutting consumption from 150,000 to 2,000 tokens.

Summary— our read of the original

Code Mode resolves both problems by inverting the execution pattern.

Key facts

01Standard MCP clients load every tool definition from every connected server into the model's context on every request, regardless of whether those tools are used.
02Intermediate results in multi-hop workflows (e.g., a meeting transcript passed between Google Drive and Salesforce) flow through the model's context multiple times — roughly 50,000 extra tokens per run in the cited example.
03Anthropic demonstrated a 98.7% token reduction on the Drive-to-Salesforce scenario, cutting consumption from 150,000 to 2,000 tokens.
04Cloudflare compressed 2,500+ API endpoints down to two tools and around 1,000 tokens of surface area using the same pattern.
05Bifrost's Code Mode exposes four meta-tools — `listToolFiles`, `readToolFile`, `getToolDocs`, and `executeToolCode` — so the model pulls only the interfaces it needs.
06Bifrost benchmarks (64 queries each) showed token drops of 58% at 96 tools, 84% at 251 tools, and 93% at 508 tools.
07All benchmark configurations maintained a 100% task pass rate, meaning no accuracy was traded for cost savings.

Topics

#mcp #code-generation #tool-use #cost-optimization #agent-framework

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics