Cost tracking bug inverted model efficiency comparisons
Pavel Gajvoronski discovered a $12 cost-tracking bug in Kepion's dashboard that underreported premium model costs by up to 73%, inverting efficiency rankings and breaking auto-downgrade decisions across 300+ routed models.
Score breakdown
Developers building multi-model routing systems must track input and output token costs separately—a single blended price can silently corrupt cost-efficiency rankings and break auto-scaling decisions, leading to runaway spending and incorrect model selection at scale.
- 01 <item>Cost tracker used a single price for both input and output tokens, ignoring OpenRouter's separate pricing tiers.</item> <item>Claude Opus costs were underreported by 73% ($0.04 reported vs. $0.1395 actual) due to a 5x output-token premium.</item> <item>DeepSeek V3 costs were underreported by only 30%, making expensive models appear artificially cost-efficient.</item> <item>Over one week, the bug caused a $12 cumulative error—25% undercount of actual spending ($47 reported vs. $59 actual).</item> <item>The bug corrupted auto-downgrade decisions, cost anomaly detection, and circuit breaker thresholds, allowing runaway Opus loops to drain up to $37/hour undetected.</item> <item>At scale with 100 concurrent businesses, the error compounds to $6,000/year of invisible cost.</item> <item>The fix splits MODEL_PRICES into separate input and output rates per model in the calculate_cost function.</item>
Pavel Gajvoronski, building Kepion—an AI platform deploying companies via 31 specialized agents—ran a cost benchmark across four model tiers routed through OpenRouter. The results seemed too good: MiniMax M2.7 appeared to outperform Claude Opus at 7% of the cost. Checking the raw numbers revealed a $12 bug silently inverting every score-per-dollar comparison.
The cost tracker calculated total cost by summing input and output tokens, then multiplying by a single price per 1M tokens.
The cost tracker calculated total cost by summing input and output tokens, then multiplying by a single price per 1M tokens. However, OpenRouter charges separately for input and output: Opus costs $5.00/1M input and $25.00/1M output (5x difference), while DeepSeek V3 costs $0.14/1M input and $0.28/1M output (2x difference). For a typical 2,400-token input, 5,100-token output call, the tracker reported $0.04 for Opus when the actual cost was $0.1395—a 73% undercount. DeepSeek was underreported by only 30%. Since the score/$ metric divides quality by cost, underreporting expensive models' costs more than cheap models' costs made expensive models look relatively cheaper. MiniMax appeared 2.9x more efficient than Opus; in reality it was 7.1x more efficient. The ratio was directionally correct but magnified by 2.4x.
This corruption cascaded: auto-downgrade decisions fired less often, keeping agents on expensive models longer than necessary. The cost circuit breaker's per-agent hourly limit ($10) wouldn't trigger until actual spending hit $37. Over one week, the tracker reported $47 in spending but actual spending was $59—a 25% undercount that compounds at scale. With 100 concurrent businesses running 5–10 agent chains daily, the error becomes $120/week or $6,000/year. The fix splits MODEL_PRICES into separate input and output rates per model.
Key facts
- 01 <item>Cost tracker used a single price for both input and output tokens, ignoring OpenRouter's separate pricing tiers.</item> <item>Claude Opus costs were underreported by 73% ($0.04 reported vs. $0.1395 actual) due to a 5x output-token premium.</item> <item>DeepSeek V3 costs were underreported by only 30%, making expensive models appear artificially cost-efficient.</item> <item>Over one week, the bug caused a $12 cumulative error—25% undercount of actual spending ($47 reported vs. $59 actual).</item> <item>The bug corrupted auto-downgrade decisions, cost anomaly detection, and circuit breaker thresholds, allowing runaway Opus loops to drain up to $37/hour undetected.</item> <item>At scale with 100 concurrent businesses, the error compounds to $6,000/year of invisible cost.</item> <item>The fix splits MODEL_PRICES into separate input and output rates per model in the calculate_cost function.</item>