Agentic coding demands specs and verification, not just better prompts
Peter Zatloukal argues that software engineers face a "new bitter lesson": writing code is becoming less valuable than precisely defining desired outcomes and letting AI models implement them, but one-shot generation hits structural ceilings that only iterative, spec-driven workflows can overcome.
Score breakdown
Developers building agentic coding loops should shift investment from prompt refinement to spec design and verification harnesses — the article argues this structural change, not better models, is what unlocks reliable autonomous coding at scale.
- 01Peter Zatloukal draws a parallel between Rich Sutton's 2019 'Bitter Lesson' for ML researchers and a new lesson for software engineers.
- 02The new lesson: precisely defining desired outcomes will be more valuable than writing precise implementations.
- 03Zatloukal identifies five structural reasons one-shot generation fails on complex projects: undiscovered intent, underspecified requirements, non-linear error compounding, context degradation at scale, and generation-evaluation bias.
Peter Zatloukal opens by invoking Rich Sutton's 2019 "Bitter Lesson," which argued that general methods leveraging computation consistently outperform hand-engineered domain knowledge across decades of AI research. Zatloukal proposes a parallel lesson for software engineers: just as ML researchers had to abandon feature engineering, developers must let go of the instinct to hand-craft implementations and instead focus on precisely specifying desired outcomes. The bitterness, he notes, comes from the same source — the skill you spent years developing is the exact skill you need to relinquish.
Zatloukal acknowledges that frontier models can one-shot increasingly complex software, and that this ceiling rises with each model generation.
Zatloukal acknowledges that frontier models can one-shot increasingly complex software, and that this ceiling rises with each model generation. But he argues the ceiling is structural, not temporary, and identifies five properties of complex projects that guarantee it: human intent is discovered through iteration rather than transmitted upfront; requirements are inherently underspecified; errors compound non-linearly across sequential steps; context coherence degrades as generation scales; and a single agent performing both generation and evaluation is structurally biased toward confirming its own choices — a problem that better models cannot fix because the bias is architectural, not a matter of intelligence.
He coins the term "state-of-the-one-shot" to describe the upper bound on project complexity a model can reliably handle in a single pass at any given moment. The practical conclusion is that today's bottleneck is not model capability but how engineers structure their interaction with models. Specs should function as the control plane for autonomous agents, and the right harness should be thin on orchestration but thick on verification and memory. Zatloukal grounds these conclusions in his own experience building a seven-stage handwriting style-transfer pipeline, where spec-driven turns with adversarial review proved more reliable than numerical metrics or human judgment alone.
Key facts
- 01Peter Zatloukal draws a parallel between Rich Sutton's 2019 'Bitter Lesson' for ML researchers and a new lesson for software engineers.
- 02The new lesson: precisely defining desired outcomes will be more valuable than writing precise implementations.
- 03Zatloukal identifies five structural reasons one-shot generation fails on complex projects: undiscovered intent, underspecified requirements, non-linear error compounding, context degradation at scale, and generation-evaluation bias.