Role-Agent framework uses dual-role LLM co-evolution to boost agent performance
Role-Agent is a new framework by Xucong Wang, Ziyu Ma, and Shidong Yang that lets a single LLM act simultaneously as both agent and environment, achieving over 4% average performance gains on multiple benchmarks through bootstrapped co-evolution.
Score breakdown
Role-Agent demonstrates that a single LLM can bootstrap its own agent training by self-generating both process rewards and targeted practice tasks, achieving consistent gains over strong baselines without requiring separate environment models.
- 01Role-Agent is proposed by Xucong Wang, Ziyu Ma, and Shidong Yang (ArXiv, 2026-06-09).
- 02The framework enables a single LLM to function simultaneously as both the agent and the environment.
- 03World-In-Agent (WIA) uses alignment between predicted and actual future states as a process reward for environment-aware reasoning.
Role-Agent, introduced by Xucong Wang, Ziyu Ma, and Shidong Yang, addresses a persistent challenge in LLM agent development: training is often bottlenecked by inefficient interaction feedback and environments that remain static, limiting the agent's ability to generalize broadly. The proposed solution harnesses a single LLM to serve concurrently as both the agent and the environment, enabling what the authors call a bootstrapped co-evolution.
Together, these two mechanisms create a self-improving loop without requiring separate environment models or additional LLMs.
The framework consists of two complementary components. In World-In-Agent (WIA), the LLM acts as the agent and predicts future environment states following each action; the degree of alignment between those predictions and the actual states is used as a process reward, encouraging the model to develop environment-aware reasoning. In Agent-In-World (AIW), the LLM examines failure modes extracted from failed trajectories and retrieves tasks that share similar failure patterns, dynamically reshaping the training data distribution to target the agent's specific weaknesses. Together, these two mechanisms create a self-improving loop without requiring separate environment models or additional LLMs.
Experiments conducted across multiple benchmarks demonstrate that Role-Agent consistently improves performance, yielding an average gain of over 4% over strong baselines.
Key facts
- 01Role-Agent is proposed by Xucong Wang, Ziyu Ma, and Shidong Yang (ArXiv, 2026-06-09).
- 02The framework enables a single LLM to function simultaneously as both the agent and the environment.
- 03World-In-Agent (WIA) uses alignment between predicted and actual future states as a process reward for environment-aware reasoning.
- 04Agent-In-World (AIW) analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns to reshape training data.
- 05Role-Agent achieves an average performance gain of over 4% over strong baselines across multiple benchmarks.
- 06The framework targets two core limitations: inefficient interaction feedback and static training environments.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 10, 2026 · 15:34 UTC. How this works →