Apr 20, 2026·1 min readApplications & Use Cases

Cost tracking bug inverted model efficiency comparisons

Pavel Gajvoronski discovered a $12 cost-tracking bug in Kepion's dashboard that underreported premium model costs by up to 73%, inverting efficiency rankings and breaking auto-downgrade decisions across 300+ routed models.

Dev.to #ai·Pavel Gajvoronski

Read at source

Composite

5.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building multi-model routing systems must track input and output token costs separately—a single blended price can silently corrupt cost-efficiency rankings and break auto-scaling decisions, leading to runaway spending and incorrect model selection at scale.

01 <item>Cost tracker used a single price for both input and output tokens, ignoring OpenRouter's separate pricing tiers.</item> <item>Claude Opus costs were underreported by 73% ($0.04 reported vs. $0.1395 actual) due to a 5x output-token premium.</item> <item>DeepSeek V3 costs were underreported by only 30%, making expensive models appear artificially cost-efficient.</item> <item>Over one week, the bug caused a $12 cumulative error—25% undercount of actual spending ($47 reported vs. $59 actual).</item> <item>The bug corrupted auto-downgrade decisions, cost anomaly detection, and circuit breaker thresholds, allowing runaway Opus loops to drain up to $37/hour undetected.</item> <item>At scale with 100 concurrent businesses, the error compounds to $6,000/year of invisible cost.</item> <item>The fix splits MODEL_PRICES into separate input and output rates per model in the calculate_cost function.</item>

Summary— our read of the original

Pavel Gajvoronski, building Kepion—an AI platform deploying companies via 31 specialized agents—ran a cost benchmark across four model tiers routed through OpenRouter. The results seemed too good: MiniMax M2.7 appeared to outperform Claude Opus at 7% of the cost. Checking the raw numbers revealed a $12 bug silently inverting every score-per-dollar comparison.

The cost tracker calculated total cost by summing input and output tokens, then multiplying by a single price per 1M tokens.

The cost tracker calculated total cost by summing input and output tokens, then multiplying by a single price per 1M tokens. However, OpenRouter charges separately for input and output: Opus costs $5.00/1M input and $25.00/1M output (5x difference), while DeepSeek V3 costs $0.14/1M input and $0.28/1M output (2x difference). For a typical 2,400-token input, 5,100-token output call, the tracker reported $0.04 for Opus when the actual cost was $0.1395—a 73% undercount. DeepSeek was underreported by only 30%. Since the score/$ metric divides quality by cost, underreporting expensive models' costs more than cheap models' costs made expensive models look relatively cheaper. MiniMax appeared 2.9x more efficient than Opus; in reality it was 7.1x more efficient. The ratio was directionally correct but magnified by 2.4x.

This corruption cascaded: auto-downgrade decisions fired less often, keeping agents on expensive models longer than necessary. The cost circuit breaker's per-agent hourly limit ($10) wouldn't trigger until actual spending hit $37. Over one week, the tracker reported $47 in spending but actual spending was $59—a 25% undercount that compounds at scale. With 100 concurrent businesses running 5–10 agent chains daily, the error becomes $120/week or $6,000/year. The fix splits MODEL_PRICES into separate input and output rates per model.

Key facts

01 <item>Cost tracker used a single price for both input and output tokens, ignoring OpenRouter's separate pricing tiers.</item> <item>Claude Opus costs were underreported by 73% ($0.04 reported vs. $0.1395 actual) due to a 5x output-token premium.</item> <item>DeepSeek V3 costs were underreported by only 30%, making expensive models appear artificially cost-efficient.</item> <item>Over one week, the bug caused a $12 cumulative error—25% undercount of actual spending ($47 reported vs. $59 actual).</item> <item>The bug corrupted auto-downgrade decisions, cost anomaly detection, and circuit breaker thresholds, allowing runaway Opus loops to drain up to $37/hour undetected.</item> <item>At scale with 100 concurrent businesses, the error compounds to $6,000/year of invisible cost.</item> <item>The fix splits MODEL_PRICES into separate input and output rates per model in the calculate_cost function.</item>

Topics

#multi-agent #cost-optimization #agent-framework #model-routing #developer-tools

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 20, 2026 · 11:47 UTC. How this works →

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics