Jun 17, 2026·1 min readAgentic Coding

More parallel subagents bloated context, not speed

Adding a 7th subagent pushed orchestrator latency from 22s to 31s because assembling 8 agents' full outputs into a single prompt created a 6,400-token context that took 4+ seconds of CPU time to serialize before any LLM call fired.

Dev.to #mcp·강해수

Read at source

Composite

6.0

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The post demonstrates that in multi-agent fanout pipelines, context assembly before the LLM call — not the LLM itself — can become the dominant latency and cost driver, and that passing only compact summary structs rather than full subagent outputs resolves both problems simultaneously.

01Adding a 7th subagent pushed total orchestrator latency from 22s to 31s
02Individual subagents finished in 9–12 seconds regardless of how many ran in parallel
03With 8 subagents each returning ~800 tokens, the aggregation context reached ~6,400 tokens (52,480 bytes)

Summary— our read of the original

강해수 describes a production fanout orchestration pattern in an ad-creative analysis SaaS where N subagents run in parallel, each returning a full analysis, and an orchestrator LLM merges the results into a single verdict. The subagents themselves were not the problem — they completed in 9–12 seconds regardless of how many ran concurrently. The bottleneck emerged in the aggregation step: with 8 subagents each returning ~800 tokens, the orchestrator assembled a 6,400-token (52,480-byte) context before it could issue its first LLM call. On Cloudflare Workers, serializing 8 JSON blobs into a single prompt string consumed 4+ seconds of pure CPU time, logged as `context_assembly_backpressure`. Production data collected over 3 weeks showed aggregation's share of total wall-clock time growing from 18% at 2 subagents (14.2s total) to 61% at 8 subagents (31.1s total), with the inflection point at 6+ subagents where aggregation consumed more than half the latency.

Instead, each subagent now writes its full output to R2 on completion, and the orchestrator reads only a three-field summary struct per agent — verdict, confidence, and top_signal.

The fix did not reduce parallelism. Instead, each subagent now writes its full output to R2 on completion, and the orchestrator reads only a three-field summary struct per agent — verdict, confidence, and top_signal. Eight agents still produce eight files, but the aggregation context shrank from ~6,400 tokens to ~1,100, and the monthly cost for that pipeline step dropped from $207 to $38. The post also references an R2 chunking pattern, a D1 counter approach for tracking partial completions without polling, and a KV-based loop guard for failed aggregation retries, with full details on riversealab.com.

Key facts

01Adding a 7th subagent pushed total orchestrator latency from 22s to 31s
02Individual subagents finished in 9–12 seconds regardless of how many ran in parallel
03With 8 subagents each returning ~800 tokens, the aggregation context reached ~6,400 tokens (52,480 bytes)
04Serializing 8 JSON blobs on Cloudflare Workers took 4+ seconds of CPU time before any LLM call fired
05At 8 subagents, aggregation consumed 61% of total wall-clock time (31.1s total)
06Switching to a three-field summary struct per agent dropped aggregation context from ~6,400 to ~1,100 tokens
07Monthly cost for the pipeline step fell from $207 to $38 after the fix

Topics

#multi-agent #orchestration #performance-optimization #agent-framework

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 17, 2026 · 10:39 UTC. How this works →

Jun 17, 2026·1 min readAgentic Coding

More parallel subagents bloated context, not speed

Dev.to #mcp·강해수

Read at source

Composite

6.0

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Adding a 7th subagent pushed total orchestrator latency from 22s to 31s
02Individual subagents finished in 9–12 seconds regardless of how many ran in parallel
03With 8 subagents each returning ~800 tokens, the aggregation context reached ~6,400 tokens (52,480 bytes)

Summary— our read of the original

Instead, each subagent now writes its full output to R2 on completion, and the orchestrator reads only a three-field summary struct per agent — verdict, confidence, and top_signal.

Key facts

01Adding a 7th subagent pushed total orchestrator latency from 22s to 31s
02Individual subagents finished in 9–12 seconds regardless of how many ran in parallel
03With 8 subagents each returning ~800 tokens, the aggregation context reached ~6,400 tokens (52,480 bytes)
04Serializing 8 JSON blobs on Cloudflare Workers took 4+ seconds of CPU time before any LLM call fired
05At 8 subagents, aggregation consumed 61% of total wall-clock time (31.1s total)
06Switching to a three-field summary struct per agent dropped aggregation context from ~6,400 to ~1,100 tokens
07Monthly cost for the pipeline step fell from $207 to $38 after the fix

Topics

#multi-agent #orchestration #performance-optimization #agent-framework

Methodology

Score breakdown

Key facts

Topics

More in Agentic Coding.

Score breakdown

Key facts

Topics

More in Agentic Coding.