Shared intelligence layer could cut agent token use by 92%
Artemii Amelin argues that AI agents waste massive amounts of tokens redundantly re-fetching and parsing the same web pages, and proposes purpose-built data agents serving pre-synthesized intelligence as the fix.
Score breakdown
Teams running agents at scale should audit how many tokens are spent on data acquisition versus actual reasoning, as switching to pre-synthesized intelligence layers could cut API costs by over 90% and nearly halve response latency.
- 01Direct web retrieval by a typical agent costs ~2,600 tokens and ~4.5 seconds per market intelligence call.
- 02A pre-synthesized brief from Scriptorium (Pilot Protocol) delivers the same output in ~210 tokens and ~1.8 seconds.
- 03That represents a 92% reduction in token consumption and a 60% drop in latency.
Artemii Amelin identifies a systemic inefficiency he calls the "redundant research problem": because LLM-powered agents are stateless by design, every agent session starts from scratch, re-fetching and re-parsing the same web pages that other agents already processed. A typical market intelligence task requires fetching 3–5 URLs with HTML responses averaging 8,000–15,000 characters each, stripping boilerplate, summarizing, and only then performing the actual reasoning — burning thousands of tokens on data acquisition before producing a single word of useful output.
The root cause, Amelin argues, is architectural: HTTP was designed in 1991 to serve browser-rendered documents, not structured facts for machine reasoning.
Benchmarks from Scriptorium, running on the Pilot Protocol network, quantify the waste: direct web retrieval costs ~2,600 tokens and ~4.5 seconds per call, while a pre-synthesized brief delivers equivalent decision quality in ~210 tokens and ~1.8 seconds — a 92% reduction in token consumption and a 60% drop in latency. At 1,000 calls, the cumulative difference is 2.9 million tokens versus 490,000 tokens. Pilot Protocol's own network data, observed across 75,000+ active agents handling 7.1 billion requests since February 2026, shows that for every search a human makes, an AI agent makes 20–50 times more requests — and many of those are identical repeated lookups.
The root cause, Amelin argues, is architectural: HTTP was designed in 1991 to serve browser-rendered documents, not structured facts for machine reasoning. Agents discard roughly 90% of what they download just to extract the 10% that matters. The fix he proposes is not a smarter scraper but a fundamentally different model — purpose-built agents that specialize in maintaining and serving pre-digested, structured intelligence to any other agent that needs it, eliminating redundant fetch-parse-compress loops across the ecosystem.
Key facts
- 01Direct web retrieval by a typical agent costs ~2,600 tokens and ~4.5 seconds per market intelligence call.
- 02A pre-synthesized brief from Scriptorium (Pilot Protocol) delivers the same output in ~210 tokens and ~1.8 seconds.
- 03That represents a 92% reduction in token consumption and a 60% drop in latency.
- 04At 1,000 calls, the cumulative token gap is 2.9 million tokens (direct) vs. 490,000 tokens (pre-synthesized).
- 05Pilot Protocol's network data covers 75,000+ active agents and 7.1 billion requests since February 2026.
- 06