HyperTool doubles agent accuracy by batching tool calls into single code blocks
HyperTool is a unified MCP-style tool interface that lets LLM agents invoke multiple tools inside a single code block rather than one step at a time, more than doubling accuracy on the MCP-Universe benchmark for Qwen3-32B and Qwen3-8B models.
Score breakdown
HyperTool more than doubles multi-step tool-use accuracy on MCP-Universe for both tested models, demonstrating that collapsing deterministic tool subroutines out of the main reasoning trace is a concrete path to stronger agentic performance without changing the underlying tools or their schemas.
- 01HyperTool is a unified executable MCP-style tool interface that replaces step-wise atomic tool calls with a single code-block invocation.
- 02The paper identifies an "execution-granularity mismatch" where deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context.
- 03A HyperTool code block can call existing tools via their original schemas, manipulate returned values, and pass intermediate results locally.
Yaxin Du, Yifan Zhou, and Yujie Ge identify a core inefficiency in how tool-augmented LLM agents currently operate: every tool call, its observation, and any value transfer is surfaced in the model's main reasoning trace. They term this an "execution-granularity mismatch," arguing that locally deterministic tool workflows are unnecessarily unfolded into repeated model-visible decisions that consume context and force the model to handle low-level dataflow explicitly.
To address this, the authors introduce HyperTool, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution.
To address this, the authors introduce HyperTool, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution. Instead of issuing atomic step-wise calls, a model invokes HyperTool with a code block that can call existing tools through their original schemas, manipulate returned values, and pass intermediate results locally — folding what would otherwise be a multi-step subroutine into a single outer call. To train models on this interface, the team synthesizes HyperTool-format trajectories from cross-tool compositional tasks and verifies them in real MCP environments.
Evaluated on MCP-Universe, HyperTool raises average accuracy from 15.69% to 35.29% on Qwen3-32B and from 9.93% to 33.33% on Qwen3-8B. The approach also surpasses GPT-OSS and Kimi-k2.5 on average accuracy, demonstrating that collapsing deterministic tool subroutines into a single interface call substantially improves multi-step tool use.
Key facts
- 01HyperTool is a unified executable MCP-style tool interface that replaces step-wise atomic tool calls with a single code-block invocation.
- 02The paper identifies an "execution-granularity mismatch" where deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context.
- 03A HyperTool code block can call existing tools via their original schemas, manipulate returned values, and pass intermediate results locally.
- 04Training data is synthesized from cross-tool compositional tasks and verified in real MCP environments.
- 05On MCP-Universe, Qwen3-32B accuracy improved from 15.69% to 35.29% with HyperTool.
- 06On MCP-Universe, Qwen3-8B accuracy improved from 9.93% to 33.33% with HyperTool.
- 07HyperTool surpasses GPT-OSS and Kimi-k2.5 on average accuracy on MCP-Universe.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →