Multi-task LLM cuts bug localization to one token per file
Nikolai Rozanov introduces MLC, a lightweight multi-task LLM for line-level bug localization that matches agentic approaches on Defects4J and PypiBugs benchmarks while reducing inference latency by orders of magnitude, requiring only a single generated token per file.
Score breakdown
MLC demonstrates that line-level bug localization at full-file context can match expensive agentic pipelines in accuracy while cutting inference to a single generated token per file, directly addressing the cost and latency barriers the paper identifies as blocking practical verification of LLM-generated code.
- 01MLC (multi-task LLM for bug localization) performs line-level bug classification using auxiliary decoding heads.
- 02Inference requires only a single generated token per file, reducing latency by orders of magnitude vs. agentic approaches.
- 03Achieves state-of-the-art performance among similar setups on line-level bug localization with full-file context.
Nikolai Rozanov's paper addresses a growing gap in LLM-powered software development: while code generation has accelerated rapidly, verification and bug localization methods have not kept pace. Existing approaches fall into two camps — expensive agentic pipelines that consume minutes of reasoning time and thousands of tokens per file, and lightweight methods that sacrifice either performance or context size and typically operate only at function-level granularity rather than the more precise line level.
Together these allow MLC to process an entire file and produce a bug localization result using only a single generated token, reducing inference latency by orders of magnitude compared to agentic baselines.
The proposed system, MLC, makes three core contributions: a token alignment algorithm that resolves tokenization mismatches that hampered prior work, a lightweight multi-task LLM architecture with auxiliary decoding heads for efficient line-level bug classification, and an optimized training recipe for multi-line prediction. Together these allow MLC to process an entire file and produce a bug localization result using only a single generated token, reducing inference latency by orders of magnitude compared to agentic baselines.
On the Defects4J and PypiBugs benchmarks, MLC reaches performance comparable to agentic approaches while achieving state-of-the-art results among setups of similar weight on line-level localization with full-file context. The authors also introduce a small out-of-domain Python evaluation dataset to test generalization beyond the training distribution. Code, models, and datasets are planned for open-source release upon paper acceptance.
Key facts
- 01MLC (multi-task LLM for bug localization) performs line-level bug classification using auxiliary decoding heads.
- 02Inference requires only a single generated token per file, reducing latency by orders of magnitude vs. agentic approaches.
- 03Achieves state-of-the-art performance among similar setups on line-level bug localization with full-file context.
- 04Reaches comparable performance to agentic approaches on Defects4J and PypiBugs benchmarks.
- 05Includes a token alignment algorithm to address tokenization challenges in prior work.
- 06An optimized training recipe for multi-line prediction is introduced.
- 07A small out-of-domain Python evaluation dataset is introduced to test generalization.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 10, 2026 · 15:34 UTC. How this works →