DeepSeek V4 Pro launches with 1.6T MoE params and 1M token context
DeepSeek V4 Pro launched April 24, 2026, offering a 1.6T parameter MoE model with 1M token context, dual Think/Non-Think modes, and pricing of $1.74/1M input and $3.48/1M output — undercutting GPT-4o and Claude Sonnet 4.6 for agent workloads.
Score breakdown
Track DeepSeek V4 Pro's pricing and dual-mode architecture as a potential cost-reduction lever for input-heavy agentic pipelines that rely on long context, structured output, or multi-step function calling.
- 01DeepSeek V4 Pro launched April 24, 2026, with 1.6T total parameters (MoE) and 49B active parameters.
- 02Context window is 1M tokens (described as verified by the author).
- 03Dual modes: Think mode at 8–15s latency for planning; Non-Think mode at ~2s for faster pipelines.
DeepSeek V4 Pro launched on April 24, 2026, and is described by author 이윤혁 (omqxansi) based on hands-on production agent usage. The model uses a Mixture-of-Experts (MoE) architecture with 1.6T total parameters and 49B active parameters, paired with a verified 1M token context window. It is released under an MIT license and is accessible via NVIDIA's integration API using an OpenAI-compatible client pointed at `https://integrate.api.nvidia.com/v1` with the model identifier `deepseek-ai/deepseek-v4-pro`.
The model offers two operating modes: a Think mode (8–15 second latency) aimed at multi-step planning tasks, and a Non-Think mode (~2 second latency) suited for content pipelines.
The model offers two operating modes: a Think mode (8–15 second latency) aimed at multi-step planning tasks, and a Non-Think mode (~2 second latency) suited for content pipelines. The author notes function calling is more reliable than V3.2, and that long-context tasks — such as processing full conversation logs — are now viable at scale. On pricing, V4 Pro comes in at $1.74/1M input tokens and $3.48/1M output tokens, compared to $2.50/$10.00 for GPT-4o and $3.00/$15.00 for Claude Sonnet 4.6, making it notably cheaper for agent workloads that are input-heavy with structured output requirements.
Key facts
- 01DeepSeek V4 Pro launched April 24, 2026, with 1.6T total parameters (MoE) and 49B active parameters.
- 02Context window is 1M tokens (described as verified by the author).
- 03Dual modes: Think mode at 8–15s latency for planning; Non-Think mode at ~2s for faster pipelines.
- 04Function calling is described as more reliable than V3.2.
- 05Pricing: $1.74/1M input tokens and $3.48/1M output tokens.
- 06Compared models: GPT-4o at $2.50/$10.00 and Claude Sonnet 4.6 at $3.00/$15.00 per 1M tokens.
- 07Released under an MIT license; accessible via NVIDIA's API with an OpenAI-compatible client.