Qwen3.6-27B dense model beats MoE sibling on SWE-bench
Alibaba's Qwen3.6-27B dense model scores 77.2% on SWE-bench Verified, outperforming the larger Qwen3.6-35B-A3B MoE model's 73.4% by nearly 4 points across coding agent benchmarks.
Score breakdown
Developers building coding agents should evaluate Qwen3.6-27B as a locally-runnable, Apache 2.0 alternative that outperforms larger MoE models on multi-step agentic tasks like codebase navigation and terminal operations.
- 01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
- 02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
- 03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.
David's post on Dev.to examines how Alibaba's Qwen3.6-27B, a plain dense model, outperforms its MoE sibling Qwen3.6-35B-A3B across a range of coding and agentic benchmarks. On SWE-bench Verified — which tests a model's ability to understand a real GitHub issue, locate relevant files, write a fix, and pass tests — the 27B dense model scores 77.2% versus the MoE's 73.4%. The post notes this puts Qwen3.6-27B within range of proprietary models, citing Claude Opus 4.5 at 80.9% as a reference point. The dense model also leads on SWE-bench Pro (53.5 vs 49.5), Terminal-Bench 2.0 (59.3 vs 51.5), SkillsBench Avg5 (48.2 vs 28.7), QwenWebBench (1487 vs 1397), and NL2Repo (36.2 vs 29.4).
First, full parameter utilization: the MoE activates only 3B of its 35B parameters per token, which speeds up inference but limits simultaneous knowledge access on harder reasoning tasks.
The post identifies two architectural factors behind the result. First, full parameter utilization: the MoE activates only 3B of its 35B parameters per token, which speeds up inference but limits simultaneous knowledge access on harder reasoning tasks. Second, Qwen3.6-27B uses a Gated DeltaNet + Gated Attention hybrid architecture — alternating linear-gated attention layers with standard gated attention — which processes information in compressed deltas for efficient long-context handling, supporting 262K context natively and extendable to 1M tokens.
The tradeoff is compute cost: the dense model activates all 27B parameters per forward pass versus only 3B for the MoE, making the MoE faster and less memory-intensive for simple chat workloads. However, for complex agentic tasks requiring multi-step reasoning through codebases, the dense model's full parameter access appears to provide a meaningful edge. Qwen3.6-27B also includes a built-in vision encoder (image-text-to-text), while the 35B MoE is text-only. The model is available under Apache 2.0 and can be run locally via `ollama run qwen3.6-27b`.
Key facts
- 01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
- 02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
- 03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.
- 04Qwen3.6-35B-A3B activates only 3B of 35B parameters per token; Qwen3.6-27B activates all 27B.
- 05Qwen3.6-27B uses a Gated DeltaNet + Gated Attention hybrid architecture supporting 262K context natively, extendable to 1M tokens.