Apr 22, 2026·1 min readNew Models & Releases

Qwen3.6-27B claims flagship coding performance at 55.6GB

Qwen's new 27B dense open-weight model, Qwen3.6-27B, claims to surpass the much larger Qwen3.5-397B-A17B on all major coding benchmarks while weighing just 55.6GB versus the predecessor's 807GB.

Simon Willison (main)

Read at source

Composite

7.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers running local LLMs can now access a model that claims flagship-level agentic coding performance in a 16.8GB quantized package, runnable on consumer hardware via `llama.cpp`.

01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.

Summary— our read of the original

Qwen has released Qwen3.6-27B, a dense 27-billion-parameter open-weight model that the company claims surpasses its previous open-source flagship, Qwen3.5-397B-A17B (a 397B total / 17B active MoE architecture), across all major coding benchmarks. The size contrast is striking: Qwen3.5-397B-A17B occupies 807GB on Hugging Face, while Qwen3.6-27B is just 55.6GB — making it far more practical to run locally.

A second test, generating an SVG of a "NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER," produced 6,575 tokens in 4 minutes 25 seconds at 24.74 tokens/s.

The post describes running the model using a 16.8GB quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) with `llama-server`, installed via `brew install llama.cpp`. The full command used includes flags for context length (`-c 65536`), temperature (`--temp 0.6`), top-p (`--top-p 0.95`), top-k (`--top-k 20`), and reasoning mode (`--reasoning on`), based on a recipe from a Hacker News user. On first run, the model was cached to `~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF`.

Two SVG generation tests were run as informal benchmarks. The first — "Generate an SVG of a pelican riding a bicycle" — produced 4,444 tokens in 2 minutes 53 seconds at 25.57 tokens/s, described as an outstanding result for a local model of this size. A second test, generating an SVG of a "NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER," produced 6,575 tokens in 4 minutes 25 seconds at 24.74 tokens/s.

Key facts

01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.
04The model was installed using `brew install llama.cpp` and run with a community recipe from Hacker News.
05A pelican-riding-a-bicycle SVG test generated 4,444 tokens in 2 min 53s at 25.57 tokens/s.
06A second SVG test (opossum on an e-scooter) produced 6,575 tokens in 4 min 25s at 24.74 tokens/s.

Topics

#model-release #code-generation #open-source #benchmarks #local-llms

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 23, 2026 · 11:04 UTC. How this works →

Apr 22, 2026·1 min readNew Models & Releases

Qwen3.6-27B claims flagship coding performance at 55.6GB

Qwen's new 27B dense open-weight model, Qwen3.6-27B, claims to surpass the much larger Qwen3.5-397B-A17B on all major coding benchmarks while weighing just 55.6GB versus the predecessor's 807GB.

Simon Willison (main)

Read at source

Composite

7.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers running local LLMs can now access a model that claims flagship-level agentic coding performance in a 16.8GB quantized package, runnable on consumer hardware via `llama.cpp`.

01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.

Summary— our read of the original

A second test, generating an SVG of a "NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER," produced 6,575 tokens in 4 minutes 25 seconds at 24.74 tokens/s.

Key facts

01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.
04The model was installed using `brew install llama.cpp` and run with a community recipe from Hacker News.
05A pelican-riding-a-bicycle SVG test generated 4,444 tokens in 2 min 53s at 25.57 tokens/s.
06A second SVG test (opossum on an e-scooter) produced 6,575 tokens in 4 min 25s at 24.74 tokens/s.

Topics

#model-release #code-generation #open-source #benchmarks #local-llms

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics