Cloudflare's "Code Mode" replaces JSON tool calling with executable code
Sunil Pai of Cloudflare presents "Code Mode," an approach where AI agents generate executable JavaScript instead of using JSON-based tool calls, reducing a 1.2–1.5 million token API surface down to roughly 1,000 tokens.
Score breakdown
Teams building AI agents against large API surfaces can adopt a code-generation interface (e.g., two `search`/`execute` tool calls) to slash context token usage by orders of magnitude and unlock native programming constructs like loops and parallelization that JSON tool calling cannot efficiently express.
- 01Sunil Pai builds AI agents at Cloudflare and created PartyKit, the open-source real-time multiplayer tool.
- 02"Code Mode" has AI models generate executable JavaScript instead of issuing JSON-based tool calls.
- 03Cloudflare's API surface spans approximately 2,600 endpoints, which would require ~1.2–1.5 million tokens if exposed as individual tools.
Sunil Pai, who builds AI agents at Cloudflare and is the creator of PartyKit (an open-source tool for real-time multiplayer apps), presented "Code Mode" at AI Engineer. The core idea is to have AI models generate executable code — usually JavaScript — to interact with systems, rather than relying on the conventional JSON-based tool-calling loop. Pai argues that traditional tool calling degrades badly at scale: once an agent is loaded with hundreds of tools (Google services, Jira, wikis, etc.), context fills up, composition becomes unwieldy, and the back-and-forth with the model slows everything down.
The most striking demonstration of Code Mode's efficiency came from Cloudflare's own API surface.
The most striking demonstration of Code Mode's efficiency came from Cloudflare's own API surface. With approximately 2,600 API endpoints, exposing a tool for each one would consume roughly 1.2–1.5 million tokens on the first call alone — making a full MCP server for the entire Cloudflare API effectively impossible. Pai's colleague Matt Carey solved this by exposing just two tool calls: `search` (which accepts code whose input is the full OpenAPI JSON spec) and `execute` (which provides callable functions against the results). This collapsed the token requirement to around 1,000 tokens — a roughly 99.9% reduction. A live demo illustrated the practical payoff: a user can issue a plain-language panic request like "We are getting DDoS'd, find every offending IP and block them" without navigating the Cloudflare dashboard.
Beyond token efficiency, Pai highlights that code generation restores fundamental programming capabilities — looping, stateful execution, sequencing, and parallelization — that JSON tool calling handles awkwardly or not at all. The talk also covers a new software architecture called the "Harness," observability and security in sandboxed environments, long-running workflows, generative UI, and the resurgence of capability-based security as relevant themes for this paradigm.
Key facts
- 01Sunil Pai builds AI agents at Cloudflare and created PartyKit, the open-source real-time multiplayer tool.
- 02"Code Mode" has AI models generate executable JavaScript instead of issuing JSON-based tool calls.
- 03Cloudflare's API surface spans approximately 2,600 endpoints, which would require ~1.2–1.5 million tokens if exposed as individual tools.
- 04Colleague Matt Carey reduced that token cost to ~1,000 tokens using just two tool calls: `search` and `execute`, both accepting code as input — roughly a 99.9% reduction.
- 05