Open-SWE-Traces dataset ships 207K multilingual agent trajectories
Researchers introduce Open-SWE-Traces, a dataset of 207,489 agentic trajectories across nine programming languages, designed to train open-source LLMs for autonomous software engineering tasks.
Score breakdown
Open-SWE-Traces provides a large-scale, permissively licensed, multilingual trajectory dataset that enables fine-tuning of open-source LLMs for autonomous software engineering — directly addressing the data scarcity the paper identifies as the primary bottleneck on this path.
- 01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
- 02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
- 03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.
Wasi Uddin Ahmad, Nikolai Ludwig, and Somshubra Majumdar present Open-SWE-Traces, a dataset designed to address what the paper describes as a severe deficit of diverse, large-scale trajectory data for autonomous software engineering. The dataset contains 207,489 agentic trajectories drawn from 20,000 real-world pull requests, covering nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++. All data is filtered for permissive licenses — MIT, Apache, and BSD — sourced from SWE-rebench-V2.
Minimax-M2.5 is used to generate trajectories that include explicit "thinking" processes, while Qwen3.5-122B supplies high-quality "non-thinking" traces.
The dataset employs a dual-mode, hybrid-reasoning synthesis strategy. Minimax-M2.5 is used to generate trajectories that include explicit "thinking" processes, while Qwen3.5-122B supplies high-quality "non-thinking" traces. This combination is intended to support training models capable of long-horizon reasoning across both reasoning and non-reasoning modes.
To validate the dataset, the authors fine-tune the Qwen3-30B-A3B series — including Thinking, Instruct, and Coder variants. The best-performing model achieves resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro, with the authors framing Open-SWE-Traces as a resource for distilling human-level software engineering capabilities into efficient, open-source agentic LLMs.
Key facts
- 01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
- 02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
- 03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.
- 04All data is filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2.
- 05Validation was performed by fine-tuning the Qwen3-30B-A3B series (Thinking, Instruct, and Coder variants).
- 06Best model achieves 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →