Jun 14, 2026·1 min readResearch Papers

Open-SWE-Traces dataset ships 207K multilingual agent trajectories

Researchers introduce Open-SWE-Traces, a dataset of 207,489 agentic trajectories across nine programming languages, designed to train open-source LLMs for autonomous software engineering tasks.

ArXiv·Wasi Uddin Ahmad, Nikolai Ludwig, Somshubra Majumdar

Read at source

Composite

6.4

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Open-SWE-Traces provides a large-scale, permissively licensed, multilingual trajectory dataset that enables fine-tuning of open-source LLMs for autonomous software engineering — directly addressing the data scarcity the paper identifies as the primary bottleneck on this path.

01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.

Summary— our read of the original

Wasi Uddin Ahmad, Nikolai Ludwig, and Somshubra Majumdar present Open-SWE-Traces, a dataset designed to address what the paper describes as a severe deficit of diverse, large-scale trajectory data for autonomous software engineering. The dataset contains 207,489 agentic trajectories drawn from 20,000 real-world pull requests, covering nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++. All data is filtered for permissive licenses — MIT, Apache, and BSD — sourced from SWE-rebench-V2.

Minimax-M2.5 is used to generate trajectories that include explicit "thinking" processes, while Qwen3.5-122B supplies high-quality "non-thinking" traces.

The dataset employs a dual-mode, hybrid-reasoning synthesis strategy. Minimax-M2.5 is used to generate trajectories that include explicit "thinking" processes, while Qwen3.5-122B supplies high-quality "non-thinking" traces. This combination is intended to support training models capable of long-horizon reasoning across both reasoning and non-reasoning modes.

To validate the dataset, the authors fine-tune the Qwen3-30B-A3B series — including Thinking, Instruct, and Coder variants. The best-performing model achieves resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro, with the authors framing Open-SWE-Traces as a resource for distilling human-level software engineering capabilities into efficient, open-source agentic LLMs.

Key facts

01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.
04All data is filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2.
05Validation was performed by fine-tuning the Qwen3-30B-A3B series (Thinking, Instruct, and Coder variants).
06Best model achieves 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro.

Topics

#agent-framework #benchmarks #code-generation #open-source #multilingual

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →

Jun 14, 2026·1 min readResearch Papers

Open-SWE-Traces dataset ships 207K multilingual agent trajectories

Researchers introduce Open-SWE-Traces, a dataset of 207,489 agentic trajectories across nine programming languages, designed to train open-source LLMs for autonomous software engineering tasks.

ArXiv·Wasi Uddin Ahmad, Nikolai Ludwig, Somshubra Majumdar

Read at source

Composite

6.4

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.

Summary— our read of the original

Minimax-M2.5 is used to generate trajectories that include explicit "thinking" processes, while Qwen3.5-122B supplies high-quality "non-thinking" traces.

Key facts

01Open-SWE-Traces contains 207,489 agentic trajectories across nine programming languages: Python, Go, TypeScript, JavaScript, Rust, Java, PHP, C, and C++.
02Trajectories are sourced from 20,000 real-world pull requests using OpenHands and SWE-agent harnesses.
03Hybrid-reasoning synthesis uses Minimax-M2.5 for 'thinking' traces and Qwen3.5-122B for 'non-thinking' traces.
04All data is filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2.
05Validation was performed by fine-tuning the Qwen3-30B-A3B series (Thinking, Instruct, and Coder variants).
06Best model achieves 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro.

Topics

#agent-framework #benchmarks #code-generation #open-source #multilingual

Methodology

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.