Qwen3.6-27B claims flagship coding performance at 55.6GB
Qwen's new 27B dense open-weight model, Qwen3.6-27B, claims to surpass the much larger Qwen3.5-397B-A17B on all major coding benchmarks while weighing just 55.6GB versus the predecessor's 807GB.
Score breakdown
Developers running local LLMs can now access a model that claims flagship-level agentic coding performance in a 16.8GB quantized package, runnable on consumer hardware via `llama.cpp`.
- 01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
- 02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
- 03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.
Qwen has released Qwen3.6-27B, a dense 27-billion-parameter open-weight model that the company claims surpasses its previous open-source flagship, Qwen3.5-397B-A17B (a 397B total / 17B active MoE architecture), across all major coding benchmarks. The size contrast is striking: Qwen3.5-397B-A17B occupies 807GB on Hugging Face, while Qwen3.6-27B is just 55.6GB — making it far more practical to run locally.
A second test, generating an SVG of a "NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER," produced 6,575 tokens in 4 minutes 25 seconds at 24.74 tokens/s.
The post describes running the model using a 16.8GB quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) with `llama-server`, installed via `brew install llama.cpp`. The full command used includes flags for context length (`-c 65536`), temperature (`--temp 0.6`), top-p (`--top-p 0.95`), top-k (`--top-k 20`), and reasoning mode (`--reasoning on`), based on a recipe from a Hacker News user. On first run, the model was cached to `~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF`.
Two SVG generation tests were run as informal benchmarks. The first — "Generate an SVG of a pelican riding a bicycle" — produced 4,444 tokens in 2 minutes 53 seconds at 25.57 tokens/s, described as an outstanding result for a local model of this size. A second test, generating an SVG of a "NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER," produced 6,575 tokens in 4 minutes 25 seconds at 24.74 tokens/s.
Key facts
- 01Qwen3.6-27B is a 27B dense open-weight model that Qwen claims surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
- 02Qwen3.5-397B-A17B weighs 807GB on Hugging Face; Qwen3.6-27B weighs just 55.6GB.
- 03A quantized version (`unsloth/Qwen3.6-27B-GGUF:Q4_K_M`) runs at 16.8GB locally via `llama-server`.
- 04The model was installed using `brew install llama.cpp` and run with a community recipe from Hacker News.
- 05A pelican-riding-a-bicycle SVG test generated 4,444 tokens in 2 min 53s at 25.57 tokens/s.
- 06A second SVG test (opossum on an e-scooter) produced 6,575 tokens in 4 min 25s at 24.74 tokens/s.