Agent-World framework scales real-world environments for general agent training
Researchers introduce Agent-World, a self-evolving training arena that synthesizes thousands of real-world MCP-compatible environments and tasks to train general-purpose LLM agents, with 8B and 14B models outperforming strong proprietary baselines across 23 benchmarks.
Score breakdown
Teams building agentic coding assistants and MCP-based tool integrations can draw on Agent-World's environment synthesis and self-evolving training approach to produce more robust agents without manually curating large task datasets.
- 01Authors Guanting Dong, Junting Lu, and Junjie Huang present Agent-World, a self-evolving training arena for general agent intelligence.
- 02Agent-World uses the Model Context Protocol (MCP) as a unified interface for connecting agents with real-world tool environments.
- 03The Agentic Environment-Task Discovery component synthesizes verifiable tasks with controllable difficulty from thousands of real-world environment themes.
Guanting Dong, Junting Lu, and Junjie Huang introduce Agent-World, a framework that addresses a key bottleneck in training general-purpose LLM agents: the scarcity of realistic, stateful environments and principled mechanisms for continual learning. The system leverages the Model Context Protocol (MCP) and broader agent skill interfaces to connect agents with scalable real-world services, then builds a self-evolving arena on top of that foundation.
The paper's analyses further reveal scaling trends in relation to environment diversity and the number of self-evolution rounds, offering concrete insights for researchers building toward general agent intelligence.
Agent-World's first component, Agentic Environment-Task Discovery, autonomously explores topic-aligned databases and executable tool ecosystems drawn from thousands of real-world environment themes, synthesizing verifiable tasks with controllable difficulty. The second component, Continuous Self-Evolving Agent Training, combines multi-environment reinforcement learning with a self-evolving arena that dynamically identifies capability gaps and generates targeted tasks to close them — enabling the co-evolution of both agent policies and the environments themselves.
Evaluated across 23 challenging agent benchmarks, Agent-World-8B and 14B models consistently outperform strong proprietary models and environment scaling baselines. The paper's analyses further reveal scaling trends in relation to environment diversity and the number of self-evolution rounds, offering concrete insights for researchers building toward general agent intelligence.
Key facts
- 01Authors Guanting Dong, Junting Lu, and Junjie Huang present Agent-World, a self-evolving training arena for general agent intelligence.
- 02Agent-World uses the Model Context Protocol (MCP) as a unified interface for connecting agents with real-world tool environments.
- 03The Agentic Environment-Task Discovery component synthesizes verifiable tasks with controllable difficulty from thousands of real-world environment themes.
- 04Continuous Self-Evolving Agent Training combines multi-environment reinforcement learning with dynamic task synthesis to close agent capability gaps.
- 05Agent-World-8B and 14B models outperform strong proprietary models and environment scaling baselines across 23 agent benchmarks.
- 06Analyses reveal scaling trends tied to environment diversity and the number of self-evolution rounds.