Apr 25, 2026·1 min readNew Models & Releases

DeepSeek V4 Pro launches with 1.6T MoE params and 1M token context

DeepSeek V4 Pro launched April 24, 2026, offering a 1.6T parameter MoE model with 1M token context, dual Think/Non-Think modes, and pricing of $1.74/1M input and $3.48/1M output — undercutting GPT-4o and Claude Sonnet 4.6 for agent workloads.

Dev.to #llm·이윤혁 (omqxansi)

Read at source

Composite

8.0

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Track DeepSeek V4 Pro's pricing and dual-mode architecture as a potential cost-reduction lever for input-heavy agentic pipelines that rely on long context, structured output, or multi-step function calling.

01DeepSeek V4 Pro launched April 24, 2026, with 1.6T total parameters (MoE) and 49B active parameters.
02Context window is 1M tokens (described as verified by the author).
03Dual modes: Think mode at 8–15s latency for planning; Non-Think mode at ~2s for faster pipelines.

Summary— our read of the original

DeepSeek V4 Pro launched on April 24, 2026, and is described by author 이윤혁 (omqxansi) based on hands-on production agent usage. The model uses a Mixture-of-Experts (MoE) architecture with 1.6T total parameters and 49B active parameters, paired with a verified 1M token context window. It is released under an MIT license and is accessible via NVIDIA's integration API using an OpenAI-compatible client pointed at `https://integrate.api.nvidia.com/v1` with the model identifier `deepseek-ai/deepseek-v4-pro`.

The model offers two operating modes: a Think mode (8–15 second latency) aimed at multi-step planning tasks, and a Non-Think mode (~2 second latency) suited for content pipelines.

The model offers two operating modes: a Think mode (8–15 second latency) aimed at multi-step planning tasks, and a Non-Think mode (~2 second latency) suited for content pipelines. The author notes function calling is more reliable than V3.2, and that long-context tasks — such as processing full conversation logs — are now viable at scale. On pricing, V4 Pro comes in at $1.74/1M input tokens and $3.48/1M output tokens, compared to $2.50/$10.00 for GPT-4o and $3.00/$15.00 for Claude Sonnet 4.6, making it notably cheaper for agent workloads that are input-heavy with structured output requirements.

Key facts

01DeepSeek V4 Pro launched April 24, 2026, with 1.6T total parameters (MoE) and 49B active parameters.
02Context window is 1M tokens (described as verified by the author).
03Dual modes: Think mode at 8–15s latency for planning; Non-Think mode at ~2s for faster pipelines.
04Function calling is described as more reliable than V3.2.
05Pricing: $1.74/1M input tokens and $3.48/1M output tokens.
06Compared models: GPT-4o at $2.50/$10.00 and Claude Sonnet 4.6 at $3.00/$15.00 per 1M tokens.
07Released under an MIT license; accessible via NVIDIA's API with an OpenAI-compatible client.

Topics

#model-release #llm #agent-framework #tool-use #code-generation

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 25, 2026 · 21:38 UTC. How this works →

DeepSeek V4 Pro launches with 1.6T MoE params and 1M token context

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics