Apr 22, 2026·1 min readNew Models & Releases

Qwen3.6-Plus brings 1M context and frontier coding performance at a fraction of Claude's cost

Alibaba's Qwen3.6-Plus delivers a 1-million-token context window, always-on chain-of-thought reasoning, and a SWE-bench Verified score of 78.8% — roughly 18× cheaper per token than Claude Opus 4.6, with a free preview on OpenRouter.

Dev.to #llm·Jangwook Kim

Read at source

Composite

7.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building agentic coding tools or RAG pipelines can now evaluate a model competitive with Claude Opus 4.6 on SWE-bench and document parsing benchmarks at roughly 18× lower token cost, with a free preview available immediately on OpenRouter.

01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).
03Priced approximately 18× cheaper per token than Claude Opus 4.6.

Summary— our read of the original

Jangwook Kim's article on Dev.to profiles Qwen3.6-Plus, Alibaba's flagship model released April 2, 2026, as a cost-efficient alternative to top-tier closed models for developers building agents, RAG pipelines, and code-generation tools. The model's headline capabilities include a 1-million-token context window (roughly 2,000 pages of text or a large monorepo in a single request), always-on chain-of-thought reasoning baked in permanently — unlike earlier Qwen3 models that toggled between thinking and non-thinking modes — and native function calling. At approximately 18× cheaper per token than Claude Opus 4.6, with a free preview available on OpenRouter, the article frames it as a significant cost recalibration for the frontier tier. Alibaba stress-tested the model against its own production workloads (Qwen App, Wukong enterprise platform, Taobao, and Tmall) before public release.

The architecture departs from standard transformer attention using a hybrid pattern: 10 blocks of (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)).

The architecture departs from standard transformer attention using a hybrid pattern: 10 blocks of (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)). Gated DeltaNet provides linear O(n) attention scaling with 32 value heads and 16 query/key heads, replacing the O(n²) memory cost of standard attention that would make 1M-token context prohibitively expensive. The sparse MoE layer uses 256 experts per layer with 8 routed plus 1 shared expert active per token, keeping active parameter counts small despite a large total parameter footprint. A companion open-weight model, Qwen3.6-35B-A3B, offers self-hosting with 35B total and 3B active parameters, reportedly outperforming Gemma 4-31B.

On benchmarks, Qwen3.6-Plus scores 78.8% on SWE-bench Verified, 61.6% on Terminal-Bench 2.0 (vs. Claude Opus 4.6's 59.3%), 48.2% on MCPMark for tool-calling reliability, and 91.2 on OmniDocBench v1.5 (vs. Claude Opus 4.6's 87.7), making it particularly strong for RAG pipelines ingesting mixed-format documents. The article also highlights a `preserve_thinking` parameter designed for multi-turn agent loops, which retains the full reasoning chain across conversation turns to prevent context degradation over long sessions. Multimodal capabilities include visual-to-code generation from screenshots or wireframes, high-density document parsing, and temporal reasoning across video frames.

Key facts

01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).
03Priced approximately 18× cheaper per token than Claude Opus 4.6.
041-million-token context window enabled by Gated DeltaNet linear attention (O(n) scaling vs. standard O(n²)).
05Sparse MoE routing uses 256 experts per layer with 8 routed + 1 shared expert active per token.
06Open-weight companion model Qwen3.6-35B-A3B has 35B total / 3B active parameters and is self-hostable.
07`preserve_thinking` parameter retains full reasoning chain across multi-turn agent loops to prevent context degradation.

Topics

#model-release #agentic-coding #tool-use #rag #code-generation

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 11:07 UTC. How this works →

Apr 22, 2026·1 min readNew Models & Releases

Qwen3.6-Plus brings 1M context and frontier coding performance at a fraction of Claude's cost

Dev.to #llm·Jangwook Kim

Read at source

Composite

7.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).
03Priced approximately 18× cheaper per token than Claude Opus 4.6.

Summary— our read of the original

The architecture departs from standard transformer attention using a hybrid pattern: 10 blocks of (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)).

Key facts

01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).
03Priced approximately 18× cheaper per token than Claude Opus 4.6.
041-million-token context window enabled by Gated DeltaNet linear attention (O(n) scaling vs. standard O(n²)).
05Sparse MoE routing uses 256 experts per layer with 8 routed + 1 shared expert active per token.
06Open-weight companion model Qwen3.6-35B-A3B has 35B total / 3B active parameters and is self-hostable.
07`preserve_thinking` parameter retains full reasoning chain across multi-turn agent loops to prevent context degradation.

Topics

#model-release #agentic-coding #tool-use #rag #code-generation

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics