Kimi K2.6 refreshes open-model lead, eyes Opus 4.6
Moonshot's Kimi K2.6 — a 1T-parameter MoE with 32B active parameters — claims open-source SOTA across multiple benchmarks and extends the lead established by K2.5 in January, as Chinese open labs continue to close the gap with frontier models.
Score breakdown
Developers evaluating open-weight backends for coding agents and long-horizon infra tasks now have a strong new candidate in Kimi K2.6, with broad day-0 ecosystem support and benchmark-leading agentic performance to validate against their own workloads.
- 01Kimi K2.6 is a 1T-parameter MoE with 32B active parameters, 384 experts (8 routed + 1 shared), MLA attention, 256K context, and INT4 quantization.
- 02Moonshot claims open-source SOTA on SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), and HLE w/ tools (54.0).
- 03K2.6 touts long-horizon agentic capabilities: 4,000+ tool calls, 12+ hour continuous runs, and 300 parallel sub-agents via 'Claw Groups'.
Moonshot's Kimi K2.6 is a refresh of the K2.5 model released in January, extending what Latent Space describes as Moonshot's hold on the "leading Chinese open model lab" crown throughout 2026 — a period of relative silence from DeepSeek since v3.2. The model is an open-weight 1T-parameter MoE with 32B active parameters, 384 experts (8 routed + 1 shared), MLA attention, 256K context, native multimodality, and INT4 quantization. It launched with day-0 ecosystem support in vLLM, OpenRouter, Cloudflare Workers AI, Baseten, MLX, Hermes Agent, and OpenCode. Moonshot claims open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), CharXiv w/ python (86.7), and Math Vision w/ python (93.2). On the frontend design front, Moonshot reports a 68.6% win+tie rate against Gemini 3.1 Pro.
Alibaba's Qwen3.6-Max-Preview also released during the same period as an early preview of its next flagship, with improvements to agentic coding, world knowledge, and instruction following.
The more novel claims center on long-horizon agentic execution: 4,000+ tool calls, 12+ hour continuous runs, and 300 parallel sub-agents, organized through a system called "Claw Groups" for multi-agent and human coordination — a rebranding and scaling of the Agent Swarm RL work from the K2.5 era. Community reactions highlighted K2.6 as a viable Claude/GPT backend for coding and infrastructure work, with reported use cases including a 5-day autonomous infra agent run, kernel rewrites, and a Zig inference engine outperforming LM Studio by 20% TPS.
Alibaba's Qwen3.6-Max-Preview also released during the same period as an early preview of its next flagship, with improvements to agentic coding, world knowledge, and instruction following. Qwen3.6 Plus reached #7 in Code Arena, moving Alibaba to #3 lab overall. Separately, Hermes Agent surpassed 100K GitHub stars in under two months and overtook OpenClaw in weekly star growth, with the community documenting advanced multi-agent orchestration patterns including stateless ephemeral units (`skip_memory=True`, `skip_context_files=True`), LLM-driven replanning over structured failure metadata, and dynamic context injection via directory-local `AGENTS.md` / `.cursorrules` files.
Key facts
- 01Kimi K2.6 is a 1T-parameter MoE with 32B active parameters, 384 experts (8 routed + 1 shared), MLA attention, 256K context, and INT4 quantization.
- 02Moonshot claims open-source SOTA on SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), and HLE w/ tools (54.0).
- 03K2.6 touts long-horizon agentic capabilities: 4,000+ tool calls, 12+ hour continuous runs, and 300 parallel sub-agents via 'Claw Groups'.