SWE-Future uses repo forecasts to synthesize future-oriented coding benchmarks
SWE-Future is a forecast-conditioned data synthesis method that predicts future repository work to generate realistic coding-agent benchmarks without replaying historical pull requests.
Score breakdown
SWE-Future offers a path to coding-agent benchmarks that are both grounded in real repository evolution and resistant to data contamination from historical pull-request replay.
- 01SWE-Future is a forecast-conditioned data synthesis method for future-oriented coding-agent benchmarks.
- 02The method forecasts future feature implementation/enhancement, bugfix, and refactor task families using only pre-T₀ repository evidence.
- 03Forecasting was validated retrospectively across an 80-repository study.
Qiao Zhao, JianYing Qu, and Jun Zhang introduce SWE-Future, a forecast-conditioned data synthesis method designed to address a core vulnerability in coding-agent benchmarks: the tendency to replay public GitHub issues and pull requests, which exposes benchmarks to overlap with model pretraining, fine-tuning, synthetic-data generation, or benchmark-driven model selection. While fully synthetic tasks avoid this replay problem, they risk drifting away from real repository needs. SWE-Future threads this needle by forecasting future repository work — covering feature implementation/enhancement, bugfix, and refactor task families — using only evidence available before a fixed snapshot time T₀.
The forecasting step is validated retrospectively: forecasts are fixed at T₀, and later pull requests are used solely to measure whether the predicted task families match actual future repository work.
The forecasting step is validated retrospectively: forecasts are fixed at T₀, and later pull requests are used solely to measure whether the predicted task families match actual future repository work. Across an 80-repository study, the forecaster achieves 58.1% future-work relevance under the main semantic matching metric. The validated forecast families then serve as conditioning signals to synthesize a 200-task coding-agent dataset spanning 61 repositories, drawn from a task-generation snapshot rather than the later pull requests used for validation. The paper argues that repository-evolution forecasting can guide realistic, future-oriented coding-task synthesis while reducing direct dependence on historical pull-request replay.
Key facts
- 01SWE-Future is a forecast-conditioned data synthesis method for future-oriented coding-agent benchmarks.
- 02The method forecasts future feature implementation/enhancement, bugfix, and refactor task families using only pre-T₀ repository evidence.
- 03Forecasting was validated retrospectively across an 80-repository study.
- 04The forecaster achieved 58.1% future-work relevance under the main semantic matching metric.
- 05Validated forecast families were used as conditioning signals to synthesize a 200-task coding-agent dataset.
- 06The dataset spans 61 repositories and avoids replaying the later pull requests used for validation.
- 07The approach reduces direct dependence on historical pull-request replay, mitigating data-contamination risks.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →