Code review consumes 59% of agent tokens, study finds
A Concordia University study of agentic software development found that code review accounts for 59.4% of all token consumption — far outpacing initial code generation at 8.6% — as rising AI token costs become a governance problem for engineering teams.
Score breakdown
The research reframes where agent cost optimization efforts should focus — not on code generation, but on the iterative code review loop, where a structural "communication tax" drives the majority of token spend.
- 01Code review accounts for an average of 59.4% of all token consumption in the ChatDev study.
- 02Initial code generation accounts for just 8.6% of token consumption.
- 03Input tokens made up 53.9% of total consumption across all tasks, versus 24.4% for output tokens.
A research paper from Concordia University's Data-driven Analysis of Software lab, led by Emad Shihab, mapped token consumption across six development stages — design, coding, code completion, code review, testing, and documentation — using execution traces from 30 tasks run through ChatDev, an open-source multi-agent framework. The headline finding is that code review accounts for an average of 59.4% of all token consumption, while initial code generation accounts for just 8.6%. The researchers attribute this to a structural "communication tax": in conversational multi-agent systems, agents engaged in code review repeatedly pass the full codebase back and forth on every turn. Across all tasks, input tokens made up 53.9% of total consumption versus 24.4% for output tokens. The one outlier is the coding stage itself, which runs output-heavy at 58% output tokens versus 6.9% input — a single instruction can yield hundreds of lines of code — while every other stage is dominated by input tokens.
The study's practical implication is that the cost profile of an agentic project depends heavily on its nature: a greenfield coding effort looks very different from a refactoring or review-heavy engagement.
The study's practical implication is that the cost profile of an agentic project depends heavily on its nature: a greenfield coding effort looks very different from a refactoring or review-heavy engagement. The researchers suggest that inserting a human checkpoint before the iterative code review loop begins could prevent significant unnecessary token burn. The authors note important caveats — the study used a single framework (ChatDev) and a single model (GPT-5) across only 30 tasks, and ChatDev is primarily a research tool rather than a production system, so the specific percentages may not map directly to commercial agents.
The broader context, as framed by the article, is that token costs are becoming a governance problem. GitHub recently moved away from flat-rate pricing for its Copilot coding agent to token-based billing, with some subscribers seeing projected costs rise tenfold overnight. Anthropic is also moving toward consumption-based API pricing. Tessl itself reports switching its default eval solver from Claude Sonnet 4.6 to the open-weight GLM 5.1, achieving an 88.5% task agreement rate at an eval cost roughly 28% lower.
Key facts
- 01Code review accounts for an average of 59.4% of all token consumption in the ChatDev study.
- 02Initial code generation accounts for just 8.6% of token consumption.
- 03Input tokens made up 53.9% of total consumption across all tasks, versus 24.4% for output tokens.
- 04The coding stage is the outlier: 58% output tokens vs. 6.9% input.
- 05The study analyzed 30 tasks run through ChatDev using GPT-5, led by Emad Shihab at Concordia University.
- 06GitHub abandoned flat-rate Copilot agent pricing for token-based billing, with some subscribers seeing projected costs rise tenfold.
- 07Tessl switching from Claude Sonnet 4.6 to GLM 5.1 for evals cut costs ~28% while maintaining 88.5% task agreement.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 11, 2026 · 08:34 UTC. How this works →