DeepSeek V4 captures 17% of token share while Anthropic holds 65% of spend
Vercel's AI Gateway production index for May 2026 shows DeepSeek V4 surging from under 1% to 17% of token volume in a single month, while Anthropic extended its spend dominance to 65% of all gateway spend.
Score breakdown
The data shows that a value-tier model — DeepSeek V4 — cleared the production quality bar for the first time at its price point, reshaping token volume distribution in a single month, while frontier spend continued to grow, illustrating a market splitting into distinct cost tiers rather than converging on one.
- 01DeepSeek's token share on AI Gateway jumped from under 1% in April to 17% in May, placing it third ahead of OpenAI.
- 02Despite 17% token share, DeepSeek's spend share remained near 1%, reflecting its ultra-low pricing.
- 03`deepseek/deepseek-v4-flash` launched at $0.14 input / $0.28 output per million tokens — roughly 20–50× cheaper than comparable Anthropic models.
Vercel's AI Gateway routes tens of trillions of tokens monthly between production applications and AI labs, and its June 2026 production index covers May usage data. The headline finding is a dramatic bifurcation: the low-cost end of the market exploded in token volume while the frontier end grew faster in dollars. DeepSeek entered May with less than 1% of token share and less than 0.2% of spend; by month-end it held 17% of tokens — third place overall, ahead of OpenAI — while its spend share stayed near 1%. Nearly all of that volume came from two models released in May: `deepseek/deepseek-v4-flash` and `deepseek/deepseek-v4-pro`. V4 Flash launched at $0.14 input / $0.28 output per million tokens, described as roughly 20–50× lower than comparable Anthropic models and 8–12× lower than other value-tier flagships such as Qwen 3.6 Plus and Kimi K2.6. The report notes that price alone does not explain the speed of adoption — teams testing DeepSeek V4 against their existing evaluations found output quality sufficient for production, making it the first model at its price point to clear that bar at scale on the gateway.
OpenAI's token share held near 13% while its spend share ticked up from 12% to 13%, indicating customers paid more per OpenAI token in May.
On the frontier side, Anthropic's token share grew from 26% to 32% and its spend share from 61% to 65%, with the report noting Anthropic captures 70–80% of spend across every high-stakes use case. OpenAI's token share held near 13% while its spend share ticked up from 12% to 13%, indicating customers paid more per OpenAI token in May. The average token became more expensive in May despite DeepSeek pulling the average down, because frontier-model workloads grew faster than non-frontier ones. The AI coding agent use case is cited as the clearest illustration of the low-cost/frontier split.
The report also highlights increased pricing sensitivity through routing behavior. Gemini 3.5 Flash launched in May at a higher price than Gemini 3.0 Flash, but by month-end 3.0 Flash still held 90% of the Flash family's tokens while 3.5 Flash held only 7% — a stark contrast to the rapid adoption of Gemini 3.1 Pro earlier in the year. Additional data points include: B2B applications cost roughly 60% more per token than B2C in May; just under a quarter of requests end in a tool call but those requests carry well over half of all tokens; and apps serving 1M+ requests route across 11 or more models in the majority of cases.
Key facts
- 01DeepSeek's token share on AI Gateway jumped from under 1% in April to 17% in May, placing it third ahead of OpenAI.
- 02Despite 17% token share, DeepSeek's spend share remained near 1%, reflecting its ultra-low pricing.
- 03`deepseek/deepseek-v4-flash` launched at $0.14 input / $0.28 output per million tokens — roughly 20–50× cheaper than comparable Anthropic models.
- 04Anthropic's spend share grew from 61% to 65%, capturing 70–80% of spend across every high-stakes use case.
- 05Gemini 3.5 Flash held only 7% of the Flash family's tokens by month-end, while Gemini 3.0 Flash held 90%.
- 06B2B applications cost roughly 60% more per token than B2C applications in May.
- 07Apps serving 1M+ requests route across 11 or more models in the majority of cases.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →