DSG architecture cuts search costs 98% while matching native LLM accuracy
Researchers Emmanuel Aboah Boateng, Kyle MacDonald, and Amardeep Kumar propose Decoupled Search Grounding (DSG), a vendor-agnostic MCP-compatible architecture that moves real-time search outside the reasoning model, achieving up to 98% lower search costs and 68% lower latency while nearly matching native search accuracy.
Score breakdown
DSG demonstrates that externalizing search grounding into a shared, MCP-compatible layer can reduce production search costs by over 98% while preserving accuracy, replacing a fixed, opaque model feature with a tunable, provider-agnostic interface.
- 01DSG (Decoupled Search Grounding) moves search grounding outside the reasoning model via an MCP-compatible gateway.
- 02Native search grounding bundles retrieval policy, provider choice, cost, and latency behind a single model-provider boundary, which the authors call hard to inspect, tune, or reuse.
- 03The authors identify 'Search-Induced Verbosity' as a failure mode where bundled search breaks strict output contracts.
Emmanuel Aboah Boateng, Kyle MacDonald, and Amardeep Kumar present Decoupled Search Grounding (DSG), a vendor-agnostic architecture designed to move real-time search grounding outside the reasoning model and expose it as a controllable interface boundary. The core problem they identify is that native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary — making it opaque, hard to tune, and prone to a phenomenon the authors call Search-Induced Verbosity, which can break strict output contracts. DSG addresses this by routing grounding through an MCP-compatible gateway that exposes provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and both exact and semantic caching as first-class controls.
The authors evaluate DSG across five frontier models on three benchmarks: SimpleQA, FreshQA, and HotpotQA.
The authors evaluate DSG across five frontier models on three benchmarks: SimpleQA, FreshQA, and HotpotQA. Native search leads on recency-sensitive FreshQA, but DSG demonstrates a stronger tradeoff profile where control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) while achieving 91% lower search cost, and its warm-cache layer reaches a 99.4% hit rate with 68% lower latency. In a deployed production setting handling large-scale agentic workloads for an e-commerce query-understanding task, DSG matches or slightly exceeds native-search accuracy while cutting search cost by over 98%. The paper concludes that real-time grounding is best treated as an optimizable interface boundary rather than a fixed model feature.
Key facts
- 01DSG (Decoupled Search Grounding) moves search grounding outside the reasoning model via an MCP-compatible gateway.
- 02Native search grounding bundles retrieval policy, provider choice, cost, and latency behind a single model-provider boundary, which the authors call hard to inspect, tune, or reuse.
- 03The authors identify 'Search-Induced Verbosity' as a failure mode where bundled search breaks strict output contracts.
- 04On SimpleQA, DSG achieves 86.1% accuracy vs. 87.7% for native search, at 91% lower search cost.
- 05DSG's warm-cache layer reaches a 99.4% hit rate with 68% lower latency.
- 06On a production e-commerce query-understanding workload, DSG cuts search cost by over 98% while matching or slightly exceeding native-search accuracy.
- 07Evaluation spans five frontier models across SimpleQA, FreshQA, and HotpotQA benchmarks.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →