DuMate-DeepResearch sets new benchmarks with multi-agent research framework
DuMate-DeepResearch is a multi-agent Deep Research framework that achieves state-of-the-art scores on two benchmarks by combining graph-based dynamic planning, recursive search agents, and rubric-grounded synthesis.
Score breakdown
Audit every step of a complex AI research pipeline — the explicit traceability and rubric-grounded synthesis in DuMate-DeepResearch offer a concrete blueprint for reducing hallucination and improving accountability in agentic coding and research systems.
- 01DuMate-DeepResearch is a multi-agent Deep Research framework built on the Qianfan Agent Foundry, authored by Lingyong Yan, Can Xu, and Yukun Zhao.
- 02The framework targets four limitations in current DR systems: long-horizon planning, single-agent scheduling bottlenecks, hallucination in long-form synthesis, and limited auditability.
- 03An Agent Core is decoupled from an extensible Tool Ecosystem, making every intermediate decision and tool invocation explicitly traceable.
Lingyong Yan, Can Xu, and Yukun Zhao introduce DuMate-DeepResearch, a multi-agent Deep Research (DR) framework built on the Qianfan Agent Foundry, designed to address four interrelated limitations in current DR systems: long-horizon planning over underspecified scope, the bottleneck of decomposing and scheduling tasks within a single agent, hallucination risk in long-form synthesis, and limited process auditability. The architecture decouples an Agent Core — responsible for task understanding, planning, and scheduling — from an extensible Tool Ecosystem covering retrieval, evidence acquisition, and report rendering, making every intermediate decision and tool invocation explicitly traceable.
First, a graph-based dynamic planning strategy expands the research roadmap coarse-to-fine and continuously revises it through reflection, re-planning, backtracking, and parallel branching.
The framework introduces three key mechanisms. First, a graph-based dynamic planning strategy expands the research roadmap coarse-to-fine and continuously revises it through reflection, re-planning, backtracking, and parallel branching. Second, a recursive two-level execution design delegates each complex search sub-task to an inner Search Agent that runs its own planning loop, isolating noisy retrieval and stabilizing long-horizon execution. Third, a rubric-based test-time optimization mechanism dynamically generates task-specific quality criteria and uses them as live reasoning scaffolds for evidence-grounded synthesis and adaptive stopping.
On two deep research benchmarks, DuMate-DeepResearch achieves state-of-the-art results: a best overall score of 58.03% on DeepResearch Bench and a best overall score of 61.95% on DeepResearch Bench II, where it also ranks first in information recall and analysis.
Key facts
- 01DuMate-DeepResearch is a multi-agent Deep Research framework built on the Qianfan Agent Foundry, authored by Lingyong Yan, Can Xu, and Yukun Zhao.
- 02The framework targets four limitations in current DR systems: long-horizon planning, single-agent scheduling bottlenecks, hallucination in long-form synthesis, and limited auditability.
- 03An Agent Core is decoupled from an extensible Tool Ecosystem, making every intermediate decision and tool invocation explicitly traceable.
- 04A graph-based dynamic planning strategy expands research roadmaps coarse-to-fine with reflection, re-planning, backtracking, and parallel branching.
- 05A recursive two-level execution design delegates complex search sub-tasks to an inner Search Agent with its own planning loop.
- 06A rubric-based test-time optimization mechanism dynamically generates task-specific quality criteria used as live reasoning scaffolds.
- 07Achieves best overall scores of 58.03% on DeepResearch Bench and 61.95% on DeepResearch Bench II, ranking first in information recall and analysis on the latter.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 8, 2026 · 15:36 UTC. How this works →