Counterexample feedback lifts LLM regex success from 3.2% to 38.1%
A paper by Hongyi Liu, Frederic Sala, and Thomas Reps shows that structured counterexample feedback from a verifier agent dramatically improves LLM performance on regular-expression induction, boosting success rates on the hardest tasks from 3.2% to 38.1% and from 38.9% to 74.1%.
Score breakdown
The results show that structured verifier feedback — not just more data — can unlock large performance gains for LLM agents on formal reasoning tasks, pointing toward a concrete path for verifier-guided program synthesis.
- 01Authors: Hongyi Liu, Frederic Sala, and Thomas Reps (ArXiv, published 2026-06-09).
- 02The framework uses regular-expression induction as a controlled testbed for studying LLM feedback.
- 03An LLM learner proposes candidate regexes; a verifier teacher returns counterexamples showing the gap between candidate and target languages.
This paper by Hongyi Liu, Frederic Sala, and Thomas Reps investigates a fundamental question in LLM agent design: when and how can LLMs genuinely improve from feedback? Rather than studying feedback in the wild — where it is heterogeneous and hard to control — the authors use regular-expression induction as a rigorous symbolic testbed. In this setup, an LLM learner proposes candidate regular expressions derived from positive- and negative-labeled strings, and a verifier teacher responds with counterexamples that precisely characterize the difference between the candidate and the target language. The authors identify several novel counterexample-guided refinement strategies, including regularization and symbolic counterexample clusters, and layer on agentic techniques such as reflection and repair loops.
Empirically, the framework delivers large gains on challenging regex-induction benchmarks.
Empirically, the framework delivers large gains on challenging regex-induction benchmarks. On the hardest task groups, success rates improved from 3.2% to 38.1% and from 38.9% to 74.1% across two different regex domains, while also reducing the number of labeled examples needed to learn complex target expressions — tasks where standard prompting fails entirely. The authors argue these results demonstrate that LLMs can extract genuine signal from rich structured feedback beyond simply treating it as additional training data, and they frame the work as opening a path toward robust verifier-guided methods for LLM-based program synthesis and formal reasoning.
Key facts
- 01Authors: Hongyi Liu, Frederic Sala, and Thomas Reps (ArXiv, published 2026-06-09).
- 02The framework uses regular-expression induction as a controlled testbed for studying LLM feedback.
- 03An LLM learner proposes candidate regexes; a verifier teacher returns counterexamples showing the gap between candidate and target languages.
- 04Novel refinement strategies include regularization and symbolic counterexample clusters.
- 05Agentic strategies such as reflection and repair loops are also explored.
- 06On the hardest task groups, success improved from 3.2% to 38.1% and from 38.9% to 74.1% on two regex domains.
- 07The framework reduces the number of labeled examples required compared to standard prompting.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 11, 2026 · 08:34 UTC. How this works →