Apr 23, 2026·1 min readApplications & Use Cases

Qwen3.6-27B dense model beats MoE sibling on SWE-bench

Alibaba's Qwen3.6-27B dense model scores 77.2% on SWE-bench Verified, outperforming the larger Qwen3.6-35B-A3B MoE model's 73.4% by nearly 4 points across coding agent benchmarks.

Dev.to #llm·David

Read at source

Composite

5.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building coding agents should evaluate Qwen3.6-27B as a locally-runnable, Apache 2.0 alternative that outperforms larger MoE models on multi-step agentic tasks like codebase navigation and terminal operations.

01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.

Summary— our read of the original

David's post on Dev.to examines how Alibaba's Qwen3.6-27B, a plain dense model, outperforms its MoE sibling Qwen3.6-35B-A3B across a range of coding and agentic benchmarks. On SWE-bench Verified — which tests a model's ability to understand a real GitHub issue, locate relevant files, write a fix, and pass tests — the 27B dense model scores 77.2% versus the MoE's 73.4%. The post notes this puts Qwen3.6-27B within range of proprietary models, citing Claude Opus 4.5 at 80.9% as a reference point. The dense model also leads on SWE-bench Pro (53.5 vs 49.5), Terminal-Bench 2.0 (59.3 vs 51.5), SkillsBench Avg5 (48.2 vs 28.7), QwenWebBench (1487 vs 1397), and NL2Repo (36.2 vs 29.4).

First, full parameter utilization: the MoE activates only 3B of its 35B parameters per token, which speeds up inference but limits simultaneous knowledge access on harder reasoning tasks.

The post identifies two architectural factors behind the result. First, full parameter utilization: the MoE activates only 3B of its 35B parameters per token, which speeds up inference but limits simultaneous knowledge access on harder reasoning tasks. Second, Qwen3.6-27B uses a Gated DeltaNet + Gated Attention hybrid architecture — alternating linear-gated attention layers with standard gated attention — which processes information in compressed deltas for efficient long-context handling, supporting 262K context natively and extendable to 1M tokens.

The tradeoff is compute cost: the dense model activates all 27B parameters per forward pass versus only 3B for the MoE, making the MoE faster and less memory-intensive for simple chat workloads. However, for complex agentic tasks requiring multi-step reasoning through codebases, the dense model's full parameter access appears to provide a meaningful edge. Qwen3.6-27B also includes a built-in vision encoder (image-text-to-text), while the 35B MoE is text-only. The model is available under Apache 2.0 and can be run locally via `ollama run qwen3.6-27b`.

Key facts

01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.
04Qwen3.6-35B-A3B activates only 3B of 35B parameters per token; Qwen3.6-27B activates all 27B.
05Qwen3.6-27B uses a Gated DeltaNet + Gated Attention hybrid architecture supporting 262K context natively, extendable to 1M tokens.
06Qwen3.6-27B includes a built-in vision encoder (image-text-to-text); the 35B MoE is text-only.
07Qwen3.6-27B is available under Apache 2.0 and can be run locally via `ollama run qwen3.6-27b`.

Topics

#benchmarks #code-generation #model-release #open-source #agent-framework

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 23, 2026 · 11:04 UTC. How this works →

Apr 23, 2026·1 min readApplications & Use Cases

Qwen3.6-27B dense model beats MoE sibling on SWE-bench

Alibaba's Qwen3.6-27B dense model scores 77.2% on SWE-bench Verified, outperforming the larger Qwen3.6-35B-A3B MoE model's 73.4% by nearly 4 points across coding agent benchmarks.

Dev.to #llm·David

Read at source

Composite

5.6

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.

Summary— our read of the original

First, full parameter utilization: the MoE activates only 3B of its 35B parameters per token, which speeds up inference but limits simultaneous knowledge access on harder reasoning tasks.

Key facts

01Qwen3.6-27B scores 77.2% on SWE-bench Verified; Qwen3.6-35B-A3B (MoE) scores 73.4% — a 3.8-point gap.
02The dense model leads on every listed benchmark, with the largest gap on SkillsBench Avg5: 48.2 vs 28.7 (+19.5 points).
03Claude Opus 4.5 scores 80.9% on SWE-bench Verified, cited as a proprietary reference point.
04Qwen3.6-35B-A3B activates only 3B of 35B parameters per token; Qwen3.6-27B activates all 27B.
05Qwen3.6-27B uses a Gated DeltaNet + Gated Attention hybrid architecture supporting 262K context natively, extendable to 1M tokens.
06Qwen3.6-27B includes a built-in vision encoder (image-text-to-text); the 35B MoE is text-only.
07Qwen3.6-27B is available under Apache 2.0 and can be run locally via `ollama run qwen3.6-27b`.

Topics

#benchmarks #code-generation #model-release #open-source #agent-framework

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics