DeepSeek V4 launches with frontier-competitive models at a fraction of the cost
DeepSeek released two preview models — DeepSeek-V4-Pro and DeepSeek-V4-Flash — that are priced dramatically below comparable frontier models while remaining competitive on benchmarks.
Score breakdown
Track DeepSeek V4's pricing against incumbent frontier models — at $0.14/M input for Flash and $1.74/M for Pro, it sets a new low-cost reference point that could pressure pricing across the entire API market.
- 01DeepSeek released two preview models: DeepSeek-V4-Pro (1.6T total / 49B active parameters) and DeepSeek-V4-Flash (284B total / 13B active parameters).
- 02Both models are Mixture of Experts with 1 million token context windows, released under the MIT license.
- 03V4-Pro is described as likely the largest open-weights model available, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B).
DeepSeek has released the first models in its V4 series — DeepSeek-V4-Pro and DeepSeek-V4-Flash — both Mixture of Experts models supporting 1 million token context windows and released under the MIT license. V4-Pro has 1.6T total parameters with 49B active, making it larger than Kimi K2.6 (1.1T), GLM-5.1 (754B), and more than twice the size of DeepSeek V3.2 (685B). V4-Flash is smaller at 284B total parameters with 13B active. On Hugging Face, Pro weighs in at 865GB and Flash at 160GB.
V4-Flash is priced at $0.14/M input and $0.28/M output tokens — cheaper than GPT-5.4 Nano ($0.20/$1.25) and Gemini 3.1 Flash-Lite ($0.25/$1.50).
The headline story is cost. V4-Flash is priced at $0.14/M input and $0.28/M output tokens — cheaper than GPT-5.4 Nano ($0.20/$1.25) and Gemini 3.1 Flash-Lite ($0.25/$1.50). V4-Pro at $1.74/M input and $3.48/M output undercuts Gemini 3.1 Pro ($2/$12), GPT-5.4 ($2.50/$15), Claude Sonnet 4.6 ($3/$15), and Claude Opus 4.7 ($5/$25). DeepSeek's paper explains this through significant efficiency improvements: in a 1M-token context scenario, V4-Pro uses only 27% of the single-token FLOPs and 10% of the KV cache size compared to V3.2, while V4-Flash achieves just 10% of the FLOPs and 7% of the KV cache.
On benchmarks, DeepSeek's self-reported results show V4-Pro competitive with frontier models, though the paper notes it "falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months." A more capable variant, DeepSeek-V4-Pro-Max, is reported to outperform GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks through expanded reasoning tokens. Both models are available via OpenRouter, and the article notes that quantized versions from the Unsloth team are anticipated.
Key facts
- 01DeepSeek released two preview models: DeepSeek-V4-Pro (1.6T total / 49B active parameters) and DeepSeek-V4-Flash (284B total / 13B active parameters).
- 02Both models are Mixture of Experts with 1 million token context windows, released under the MIT license.
- 03V4-Pro is described as likely the largest open-weights model available, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B).
- 04V4-Flash is priced at $0.14/M input and $0.28/M output tokens — cheaper than GPT-5.4 Nano.
- 05V4-Pro is priced at $1.74/M input and $3.48/M output — the cheapest among the larger frontier models compared.
- 06In a 1M-token context, V4-Pro uses only 27% of the single-token FLOPs and 10% of the KV cache size of DeepSeek-V3.2.