DeLM framework decentralizes multi-agent LLM coordination via shared context
Researchers Yuzhen Mao and Azalia Mirhoseini propose Decentralized Language Models (DeLM), a multi-agent framework that replaces centralized orchestration with parallel agents sharing a common verified context and task queue, achieving up to 10.5 percentage point gains on SWE-bench Verified while cutting cost per task by roughly 50%.
Score breakdown
DeLM demonstrates that decentralizing multi-agent coordination through a shared verified context can simultaneously improve benchmark performance and cut per-task cost, addressing a structural scalability bottleneck in LLM test-time reasoning.
- 01Yuzhen Mao and Azalia Mirhoseini propose Decentralized Language Models (DeLM), a new multi-agent system (MAS) framework.
- 02DeLM replaces centralized orchestration with parallel agents, a shared verified context, and a task queue.
- 03Agents asynchronously claim subtasks, read shared progress, and write back compact verified updates without a central controller.
Yuzhen Mao and Azalia Mirhoseini introduce Decentralized Language Models (DeLM), a multi-agent system (MAS) framework that addresses a core scalability problem in LLM test-time reasoning: as the number of parallel subtasks grows, centralized orchestration — where a single main agent assigns work, collects outputs, and merges results — becomes a communication and integration bottleneck. DeLM replaces this architecture with three components: parallel agents, a shared verified context, and a task queue. Agents asynchronously claim subtasks, read the accumulated progress of other agents from the shared context, perform local reasoning, and write back compact verified updates. This design allows agents to build on one another's verified progress without routing every update through a central controller.
Empirically, DeLM demonstrates improvements on two distinct benchmarks. On SWE-bench Verified, it achieves the best performance across Avg.@1, Pass@2, and Pass@4 metrics, with gains of up to 10.5 percentage points over the strongest baseline, while also reducing cost per task by roughly 50%. On LongBench-v2 Multi-Doc QA, DeLM achieves the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points. Code is available on the project website.
Key facts
- 01Yuzhen Mao and Azalia Mirhoseini propose Decentralized Language Models (DeLM), a new multi-agent system (MAS) framework.
- 02DeLM replaces centralized orchestration with parallel agents, a shared verified context, and a task queue.
- 03Agents asynchronously claim subtasks, read shared progress, and write back compact verified updates without a central controller.
- 04On SWE-bench Verified, DeLM achieves the best performance across Avg.@1, Pass@2, and Pass@4 metrics.
- 05DeLM improves over the strongest SWE-bench Verified baseline by up to 10.5 percentage points.
- 06DeLM reduces cost per task by roughly 50% compared to the strongest baseline on SWE-bench Verified.
- 07On LongBench-v2 Multi-Doc QA, DeLM achieves the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 10, 2026 · 15:34 UTC. How this works →