AgenticQwen trains small tool-use models with dual data flywheels
Researchers introduce AgenticQwen, a family of small language models trained via multi-round reinforcement learning and dual data flywheels to handle industrial-scale multi-step tool use under cost and latency constraints.
Score breakdown
Teams building production AI agents on a budget now have a publicly released small-model family and training framework specifically designed to match larger models on tool-use tasks without the associated cost and latency overhead.
- 01AgenticQwen is a family of small agentic language models targeting industrial-scale tool use under cost and latency constraints.
- 02Training combines reasoning RL and agentic RL with dual data flywheels that auto-generate increasingly challenging tasks.
- 03The reasoning flywheel increases task difficulty by learning from model errors.
Yuanjie Lyu, Chengyu Wang, and Haonan Zheng present AgenticQwen, a family of small language models purpose-built for agentic tasks such as multi-step reasoning and tool use in industrial settings. The core motivation is that real-world deployments impose strict cost and latency constraints, making large models impractical and creating demand for capable small agentic models.
The training framework combines two forms of reinforcement learning — reasoning RL and agentic RL — with a dual data flywheel mechanism that automatically generates increasingly challenging training tasks.
The training framework combines two forms of reinforcement learning — reasoning RL and agentic RL — with a dual data flywheel mechanism that automatically generates increasingly challenging training tasks. The reasoning flywheel raises difficulty by learning from model errors, while the agentic flywheel transforms linear workflows into multi-branch behavior trees, better capturing the decision complexity of real-world applications. Training is conducted over multiple rounds on synthetic data supplemented by a limited amount of open-source data.
AgenticQwen is validated on multiple public agentic benchmarks and within an industrial agent system, where it closes the performance gap with much larger models on search and data analysis tasks. The authors have released model checkpoints and part of the synthetic data on Hugging Face, with data synthesis and RL training code on GitHub. The data synthesis pipeline is also integrated into EasyDistill on ModelScope.
Key facts
- 01AgenticQwen is a family of small agentic language models targeting industrial-scale tool use under cost and latency constraints.
- 02Training combines reasoning RL and agentic RL with dual data flywheels that auto-generate increasingly challenging tasks.
- 03The reasoning flywheel increases task difficulty by learning from model errors.
- 04The agentic flywheel expands linear workflows into multi-branch behavior trees to reflect real-world decision complexity.
- 05Training uses multi-round RL on synthetic data plus a limited amount of open-source data.
- 06AgenticQwen closes the performance gap with much larger models on search and data analysis tasks in an industrial agent system.
- 07Model checkpoints, synthetic data, and training code are publicly released on Hugging Face and GitHub; the data synthesis pipeline is integrated into EasyDistill.