Socratic-SWE reaches 50.40% on SWE-bench Verified via self-evolving agent skills
Socratic-SWE is a closed-loop self-evolution framework that distills an agent's historical solving traces into structured skills to generate targeted training tasks, reaching 50.40% on SWE-bench Verified after three iterations.
Score breakdown
Socratic-SWE demonstrates that an agent's own solving traces can serve as a scalable, self-improving training substrate — overcoming the limitation of fixed synthetic data pipelines that are blind to the agent's actual weaknesses.
- 01Socratic-SWE is a closed-loop self-evolution framework for LLM-driven software engineering agents.
- 02It distills historical solving traces into structured agent skills capturing recurring failures and effective repair patterns.
- 03Agent skills guide generation of targeted repair tasks in real repositories.
Chuan Xiao, Zhengbo Jiao, and Shaobo Wang present Socratic-SWE, a framework designed to address a core bottleneck in training LLM-based software engineering agents: the scarcity of high-quality SWE tasks. Existing synthetic data approaches generate tasks through fixed mutation or bug-injection pipelines whose distributions are independent of the agent's actual weaknesses and training progress. Socratic-SWE breaks this limitation by treating the agent's own historical solving traces not merely as reward signals but as a rich source of structured knowledge.
The framework distills traces into agent skills — structured summaries of recurring failures and effective repair patterns — which then drive the generation of targeted repair tasks in real code repositories.
The framework distills traces into agent skills — structured summaries of recurring failures and effective repair patterns — which then drive the generation of targeted repair tasks in real code repositories. Candidate tasks pass through execution-based validation and are scored using a solver-gradient alignment reward, ensuring that only verifiable and improvement-relevant tasks are retained. The updated Solver produces new traces after each round, allowing the task curriculum to adapt continuously over successive iterations.
Evaluated on SWE-bench Verified, SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0, Socratic-SWE consistently outperforms self-evolving baselines under the same compute budget. After three iterations, it reaches 50.40% on SWE-bench Verified. The authors conclude that solving traces can serve as a scalable substrate for self-evolving SWE agents.
Key facts
- 01Socratic-SWE is a closed-loop self-evolution framework for LLM-driven software engineering agents.
- 02It distills historical solving traces into structured agent skills capturing recurring failures and effective repair patterns.
- 03Agent skills guide generation of targeted repair tasks in real repositories.
- 04Candidate tasks are validated via execution and scored with a solver-gradient alignment reward.
- 05The task curriculum adapts over successive rounds as the updated Solver produces new traces.
- 06Socratic-SWE achieves 50.40% on SWE-bench Verified after three iterations.
- 07It consistently outperforms self-evolving baselines across SWE-bench Verified, SWE-bench Lite, SWE-bench Pro, and Terminal-Bench 2.0 under the same compute budget.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →