Consequence-aware compute routing cuts cost-weighted loss by up to 33%
Researchers propose allocating AI reasoning compute based on the real-world cost of failure — not just task difficulty — reducing cost-weighted loss by 22–33% on SWE-bench Lite compared to difficulty-aware routing.
Score breakdown
The paper demonstrates that difficulty and consequence are approximately orthogonal signals, meaning existing difficulty-based compute routing systematically under-protects high-stakes software engineering tasks — a gap the proposed scheduler directly closes.
- 01Proposed method: consequence-aware test-time compute allocation, routing higher-stakes tasks to larger compute tiers or higher thinking budgets.
- 02A lightweight predictor estimates task consequence from issue text alone, without needing ground-truth labels at inference time.
- 03Experiments cover 700 software-engineering tasks across SWE-bench Lite and Multi-SWE-bench mini.
Jingbo Wen, Liang He, and Ziqi He identify a fundamental mismatch between how reasoning models allocate test-time compute and how errors actually matter in deployment. Current methods route compute — thinking tokens, model calls, or compute budget — based on predicted task difficulty, implicitly treating every failure as equally costly. The paper illustrates the problem starkly: a typo in a log message and a migration that corrupts a production database both count as one benchmark failure, yet their real-world consequences are fundamentally different.
To address this, the authors propose consequence-aware test-time compute allocation.
To address this, the authors propose consequence-aware test-time compute allocation. A lightweight predictor estimates, from issue text alone, how costly a task would be if solved incorrectly. A scheduler then routes higher-consequence tasks to larger compute tiers or higher thinking budgets while staying within the same total budget. The paper's key empirical finding is that consequence and difficulty are approximately orthogonal under various annotations — meaning the two signals carry independent information and difficulty-based routing cannot serve as a proxy for consequence-based routing.
Experiments span 700 software-engineering tasks across SWE-bench Lite (main experiments) and Multi-SWE-bench mini (cross-dataset evaluation). The results show that current thinking models do not allocate compute sufficiently according to consequence. The issue-only predictor never misclassifies a high-consequence task as low-consequence across the 300 SWE-bench tasks evaluated. Under matched compute budgets, the consequence-aware scheduler reduces cost-weighted loss by 22–33% relative to difficulty-aware routing. The priority-aware variant — which routes by per-task cost scaled by a marginal-utility signal — exceeds 30% reduction, and its deployable predictor-driven version retains over 90% of the oracle gain.
Key facts
- 01Proposed method: consequence-aware test-time compute allocation, routing higher-stakes tasks to larger compute tiers or higher thinking budgets.
- 02A lightweight predictor estimates task consequence from issue text alone, without needing ground-truth labels at inference time.
- 03Experiments cover 700 software-engineering tasks across SWE-bench Lite and Multi-SWE-bench mini.
- 04Consequence and difficulty are found to be approximately orthogonal under various annotations.
- 05The issue-only predictor never misclassifies a high-consequence task as low-consequence across 300 SWE-bench tasks.
- 06Consequence-aware scheduling reduces cost-weighted loss by 22–33% relative to difficulty-aware routing under matched compute budgets.
- 07The priority-aware variant exceeds 30% reduction; its deployable predictor-driven version retains over 90% of the oracle gain.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →