SkillWeaver routes LLM agents through complex multi-skill tasks via decompose-retrieve-compose
Researchers introduce SkillWeaver, a framework that decomposes complex queries into atomic sub-tasks, retrieves matching skills, and composes executable plans — paired with CompSkillBench, a benchmark of 300 compositional queries over 2,209 real MCP server skills.
Score breakdown
The paper identifies task decomposition — not retrieval — as the binding constraint in multi-skill agent planning, and SAD's single-iteration fix raises decomposition accuracy by over 32 percentage points, directly improving how reliably agents can assemble executable plans from large real-world skill libraries.
- 01SkillWeaver is a decompose-retrieve-compose framework combining an LLM task decomposer, a bi-encoder skill retriever with FAISS indexing, and a dependency-aware DAG planner.
- 02CompSkillBench is a new benchmark of 300 compositional queries over 2,209 real MCP server skills spanning 24 functional categories.
- 03Standard LLM decomposition reaches only 34.2% category recall at the step level, identified as the primary bottleneck.
Xueping Gao formalizes a new problem called Compositional Skill Routing: given a complex user query and a large library of reusable tool specifications (skills), an LLM agent must decompose the query into atomic sub-tasks, retrieve the appropriate skill for each, and compose an executable plan. The proposed system, SkillWeaver, combines an LLM task decomposer, a bi-encoder skill retriever with FAISS indexing, and a dependency-aware DAG planner. To support rigorous evaluation, the paper introduces CompSkillBench, a benchmark of 300 compositional queries drawn from 2,209 real MCP server skills spanning 24 functional categories sourced from the public MCP ecosystem.
Experiments reveal that task decomposition quality is the primary bottleneck: standard LLM decomposition reaches only 34.2% category recall at the step level.
Experiments reveal that task decomposition quality is the primary bottleneck: standard LLM decomposition reaches only 34.2% category recall at the step level. To address this, the paper proposes Iterative Skill-Aware Decomposition (SAD), a retrieval-augmented feedback loop that iteratively aligns decomposition with available skills. SAD improves decomposition accuracy from 51.0% to 67.7% — a +32.7% gain (Wilcoxon p < 10^-6) — in a single iteration. A decomposition-accuracy-conditioned analysis confirms that correct granularity is a prerequisite for effective retrieval, with CatR@1 rising from 34% to 41% when decomposition accuracy equals 1. Beyond accuracy, SkillWeaver reduces context window consumption by over 99%, and transfer experiments show a +35.6% relative decomposition-accuracy gain even when target categories are absent from the retrieval pool, confirming generalization to unseen skill domains.
Key facts
- 01SkillWeaver is a decompose-retrieve-compose framework combining an LLM task decomposer, a bi-encoder skill retriever with FAISS indexing, and a dependency-aware DAG planner.
- 02CompSkillBench is a new benchmark of 300 compositional queries over 2,209 real MCP server skills spanning 24 functional categories.
- 03Standard LLM decomposition reaches only 34.2% category recall at the step level, identified as the primary bottleneck.
- 04Iterative Skill-Aware Decomposition (SAD) improves decomposition accuracy from 51.0% to 67.7% (+32.7%, Wilcoxon p < 10^-6) in a single iteration.
- 05CatR@1 rises from 34% to 41% when decomposition accuracy (DA) equals 1, confirming correct granularity is a prerequisite for effective retrieval.
- 06SkillWeaver reduces context window consumption by over 99%.
- 07Transfer experiments show a +35.6% relative DA gain even when target categories are absent from the retrieval pool.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 17, 2026 · 10:39 UTC. How this works →