Cognition backs Devin with a $10M AI productivity guarantee
Cognition Labs introduced the AI Productivity Guarantee, committing to issue credits up to $10M if its Devin agent delivers less engineering value than enterprise customers pay for.
Score breakdown
The guarantee replaces activity-based AI billing accountability with a financial commitment tied to measured engineering output, and Cognition explicitly calls on other AI vendors to adopt a similar outcome-based standard.
- 01Cognition Labs introduced the AI Productivity Guarantee for enterprise Devin customers.
- 02If Devin delivers less engineering value than a customer paid for, Cognition issues credits up to $10M.
- 03An estimator agent reviews each completed Devin session to assess usefulness and estimate equivalent human engineering hours.
Cognition Labs announced the AI Productivity Guarantee, a program under which the company will issue credits up to $10M to enterprise customers if Devin's engineering output falls short of the value those customers paid for. The announcement, authored by Scott Wu, argues that the AI industry has been measuring the wrong things — activity metrics like tokens consumed and lines of code generated — rather than actual business value delivered. Cognition positions the guarantee as a call for the broader industry to adopt outcome-based accountability.
The guarantee is underpinned by an estimator agent that reviews every completed Devin session.
The guarantee is underpinned by an estimator agent that reviews every completed Devin session. The agent assesses two things: whether the session produced useful output, and if so, how long a human engineer would have taken to complete the same work. Productivity is measured in hours rather than lines of code, since a critical bug fix may be two lines but represent hours of investigation. The estimator has access to the user's prompt, any resulting pull request, every action Devin took, and codebase context from DeepWiki. Sessions resulting in unmerged PRs or otherwise classified as unproductive are counted as not useful. The methodology was validated against a dataset of human time estimates collected from users at enterprise customers.
At the end of each annual contract, engineering hours are converted to a dollar value using a standard global rate and compared against the customer's actual consumption. If the value falls short, Cognition issues credits up to $10M. The post notes that Devin is model-independent, using whichever model is best suited to each task, and that Cognition's teams embed directly into customer accounts to identify high-value projects, run enablement workshops, and measure outcomes. Cognition states it plans to continue iterating on the estimator and publishing its findings.
Key facts
- 01Cognition Labs introduced the AI Productivity Guarantee for enterprise Devin customers.
- 02If Devin delivers less engineering value than a customer paid for, Cognition issues credits up to $10M.
- 03An estimator agent reviews each completed Devin session to assess usefulness and estimate equivalent human engineering hours.
- 04Productivity is measured in hours of output, not lines of code, because code length does not correspond to effort.
- 05The estimator uses the user's prompt, any resulting PR, every action Devin took, and codebase context from DeepWiki.
- 06Sessions with unmerged PRs or otherwise classified as unproductive are counted as not useful.
- 07Engineering hours are converted to dollar value using a standard global rate and compared against actual customer consumption near the end of each annual contract.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 12, 2026 · 10:05 UTC. How this works →