Structured contracts, not prose, make LLM architecture reviews actionable
The article argues that LLM-based architecture reviews fail in engineering workflows when they produce fluent prose instead of structured, schema-validated artifacts, and proposes a multi-agent system using PydanticAI and Claude where output contracts — not model reasoning — determine reliability.
Score breakdown
The article's central argument — that output contracts, not model fluency, determine whether LLM reviews can participate in engineering workflows like PRs, ADRs, ticketing, and CI gates — reframes the design challenge from prompt quality to schema enforcement.
- 01Most LLM architecture review demos produce prose that is hard to rank, route, deduplicate, or turn into work without a manual pass.
- 02The article proposes using PydanticAI to enforce a strict output schema, with Claude supplying reasoning but the contract shaping it into a machine-actionable artifact.
- 03The proposed minimal topology has three roles: planner, specialists (one per review lens), and synthesizer.
The article identifies a core problem with LLM-based architecture reviews: fluent, prose-style feedback is not shaped like an engineering artifact. It cannot be reliably ranked, routed, deduplicated, or converted into tickets or CI gates without a second manual pass. The proposed solution centers on output contracts — defining the schema the system must emit and validating every response against it. The article uses PydanticAI as the mechanism for enforcing these contracts, with Claude supplying the reasoning but the schema forcing that reasoning into a machine-actionable form with normalized findings, severity, evidence, recommendations, clarifying questions, and explicit uncertainty flags.
The article is candid about the tradeoffs: fan-out increases token spend and latency, sequential pipelines can amplify early scoping errors, and adding agents means building a small distributed system.
The article proposes a minimal three-role topology: a planner that reads the input and decides which review lenses to apply, a set of specialists each running one lens (security, scalability, operability, cost, data integrity, failure recovery) and emitting findings in a shared schema, and a synthesizer that deduplicates, ranks, resolves contradictions, and produces the final structured review artifact. The article is candid about the tradeoffs: fan-out increases token spend and latency, sequential pipelines can amplify early scoping errors, and adding agents means building a small distributed system. The guidance is to stay with a single structured call unless review output is consistently repetitive, shallow, or internally inconsistent.
The central thesis is stated plainly: "the schema is the product." Monolithic prompts fail in predictable ways — mixing high-confidence issues with speculative ones, contradicting themselves, and lacking traceable evidence — and those failures make downstream automation unreliable. The article notes that a full runnable example is available in a companion repository, and that the remaining sections cover the shared contracts and an end-to-end walkthrough from design doc to structured report. The source text is truncated before those sections are reached.
Key facts
- 01Most LLM architecture review demos produce prose that is hard to rank, route, deduplicate, or turn into work without a manual pass.
- 02The article proposes using PydanticAI to enforce a strict output schema, with Claude supplying reasoning but the contract shaping it into a machine-actionable artifact.
- 03The proposed minimal topology has three roles: planner, specialists (one per review lens), and synthesizer.
- 04Review lenses covered include security, scalability, operability, cost, data integrity, and failure recovery.
- 05Structured output includes normalized findings with severity, evidence, recommendations, clarifying questions, and explicit 'needs human judgment' flags.
- 06Multi-agent adds real costs: increased token spend, latency, and coordination overhead.
- 07The article's core thesis: 'the schema is the product' — system reliability comes from the output contract, not the model's reasoning alone.
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →