Qwen3.6-Plus brings 1M context and frontier coding performance at a fraction of Claude's cost
Alibaba's Qwen3.6-Plus delivers a 1-million-token context window, always-on chain-of-thought reasoning, and a SWE-bench Verified score of 78.8% — roughly 18× cheaper per token than Claude Opus 4.6, with a free preview on OpenRouter.
Score breakdown
Developers building agentic coding tools or RAG pipelines can now evaluate a model competitive with Claude Opus 4.6 on SWE-bench and document parsing benchmarks at roughly 18× lower token cost, with a free preview available immediately on OpenRouter.
- 01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
- 02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).
- 03Priced approximately 18× cheaper per token than Claude Opus 4.6.
Jangwook Kim's article on Dev.to profiles Qwen3.6-Plus, Alibaba's flagship model released April 2, 2026, as a cost-efficient alternative to top-tier closed models for developers building agents, RAG pipelines, and code-generation tools. The model's headline capabilities include a 1-million-token context window (roughly 2,000 pages of text or a large monorepo in a single request), always-on chain-of-thought reasoning baked in permanently — unlike earlier Qwen3 models that toggled between thinking and non-thinking modes — and native function calling. At approximately 18× cheaper per token than Claude Opus 4.6, with a free preview available on OpenRouter, the article frames it as a significant cost recalibration for the frontier tier. Alibaba stress-tested the model against its own production workloads (Qwen App, Wukong enterprise platform, Taobao, and Tmall) before public release.
The architecture departs from standard transformer attention using a hybrid pattern: 10 blocks of (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)).
The architecture departs from standard transformer attention using a hybrid pattern: 10 blocks of (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)). Gated DeltaNet provides linear O(n) attention scaling with 32 value heads and 16 query/key heads, replacing the O(n²) memory cost of standard attention that would make 1M-token context prohibitively expensive. The sparse MoE layer uses 256 experts per layer with 8 routed plus 1 shared expert active per token, keeping active parameter counts small despite a large total parameter footprint. A companion open-weight model, Qwen3.6-35B-A3B, offers self-hosting with 35B total and 3B active parameters, reportedly outperforming Gemma 4-31B.
On benchmarks, Qwen3.6-Plus scores 78.8% on SWE-bench Verified, 61.6% on Terminal-Bench 2.0 (vs. Claude Opus 4.6's 59.3%), 48.2% on MCPMark for tool-calling reliability, and 91.2 on OmniDocBench v1.5 (vs. Claude Opus 4.6's 87.7), making it particularly strong for RAG pipelines ingesting mixed-format documents. The article also highlights a `preserve_thinking` parameter designed for multi-turn agent loops, which retains the full reasoning chain across conversation turns to prevent context degradation over long sessions. Multimodal capabilities include visual-to-code generation from screenshots or wireframes, high-density document parsing, and temporal reasoning across video frames.
Key facts
- 01Released April 2, 2026; available via Alibaba Cloud Model Studio and OpenRouter, with a free preview version on OpenRouter.
- 02SWE-bench Verified score of 78.8%, directly competitive with Claude Opus 4.6; beats it on Terminal-Bench 2.0 (61.6% vs. 59.3%).