Parallel-Synthesis cuts agent synthesis latency up to 11x by consuming KV caches directly
Researchers introduce Parallel-Synthesis, a plug-and-play framework that lets a synthesizer agent consume KV caches from parallel worker agents directly — skipping text concatenation — and reduces time-to-first-token by 2.5x–11x.
Score breakdown
The framework replaces the standard text-concatenation bottleneck in multi-agent synthesis with direct KV cache consumption, cutting time-to-first-token by up to 11x while preserving or improving task accuracy across diverse benchmarks.
- 01Parallel-Synthesis is a plug-and-play framework enabling a synthesizer to consume KV caches from parallel worker agents directly, bypassing text concatenation.
- 02The framework has two core components: a cache mapper that calibrates independently generated branch caches, and a fine-tuned synthesizer adapter for non-sequential cache generation.
- 03Training data exposes the synthesizer to parallel cache contexts, teaches cross-branch aggregation, and distills reasoning from text-concatenation-based synthesis.
Shikun Liu, Mufei Li, and Dongqi Fu identify a structural mismatch in modern LLM-agent systems: while agent workflows increasingly rely on parallel branches to explore subtasks, retrieve evidence, or generate candidate solutions, the final synthesis step still operates through a sequential text interface. The standard approach — concatenating the textual outputs of all branches — discards the parallel structure and incurs redundant prefill computation at the synthesizer.
The system is trained on data that exposes the synthesizer to parallel cache contexts, teaches aggregation across cached branches, and distills reasoning behavior from standard text-concatenation-based synthesis.
To address this, the paper introduces Parallel-Synthesis, described as a plug-and-play framework with two core components: a cache mapper that calibrates the independently generated KV caches from each branch, and a fine-tuned synthesizer adapter that enables generation directly from this non-sequential cache interface. The system is trained on data that exposes the synthesizer to parallel cache contexts, teaches aggregation across cached branches, and distills reasoning behavior from standard text-concatenation-based synthesis.
Experiments span nine downstream datasets covering math, science QA, code generation, GAIA, and multi-agent database diagnosis. Parallel-Synthesis matches or outperforms text-based synthesis on seven of the nine datasets and remains close on the remaining two. Critically, it reduces time-to-first-token by 2.5x–11x, with the authors concluding that direct cache-based synthesis is a promising interface for more native and efficient synthesis over parallel agent branches.
Key facts
- 01Parallel-Synthesis is a plug-and-play framework enabling a synthesizer to consume KV caches from parallel worker agents directly, bypassing text concatenation.
- 02The framework has two core components: a cache mapper that calibrates independently generated branch caches, and a fine-tuned synthesizer adapter for non-sequential cache generation.
- 03Training data exposes the synthesizer to parallel cache contexts, teaches cross-branch aggregation, and distills reasoning from text-concatenation-based synthesis.
- 04Evaluated on nine downstream datasets spanning math, science QA, code generation, GAIA, and multi-agent database diagnosis.
- 05Parallel-Synthesis matches or outperforms text-based synthesis on seven of the nine datasets.
- 06Time-to-first-token is reduced by 2.5x–11x compared to text-based synthesis.
- 07Authors are Shikun Liu, Mufei Li, and Dongqi Fu; the paper was published on ArXiv on 2026-06-12.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 15, 2026 · 11:57 UTC. How this works →