Ollama v0.21.1 adds Kimi CLI and MLX improvements
Ollama `v0.21.1` introduces Kimi CLI support for long-horizon agentic tasks, alongside multiple MLX runner performance improvements and several bug fixes.
Score breakdown
Developers building long-horizon agentic pipelines can now launch Kimi K2.6's multi-agent system directly from Ollama, while MLX users benefit from faster sampling and tokenization without any configuration changes.
- 01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
- 02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
- 03MLX runner gains logprobs support for compatible models.
Ollama `v0.21.1` introduces Kimi CLI as a launchable tool within Ollama, invoked via `ollama launch kimi --model kimi-k2.6:cloud`. According to the release notes, Kimi CLI paired with Kimi K2.6 is designed to excel at long-horizon agentic execution tasks through a multi-agent system, making it a notable addition for practitioners building or running extended AI agent workflows.
Additionally, GLM4 MoE Lite sees a performance gain through a fused sigmoid router head.
On the performance side, the MLX runner receives several targeted improvements: logprobs support is added for compatible models, sampling is accelerated by fusing top-P and top-K into a single sort pass with repeat penalties now applied directly in the sampler, and prompt tokenization is moved into request handler goroutines for better throughput. Thread safety for array management in MLX is also improved. Additionally, GLM4 MoE Lite sees a performance gain through a fused sigmoid router head.
Two bugs are resolved in this release: the macOS app's model picker no longer shows a stale model after switching chats, and structured outputs for Gemma 4 are fixed when `think=false`.
Key facts
- 01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
- 02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
- 03MLX runner gains logprobs support for compatible models.
- 04MLX sampling is faster via fused top-P and top-K in a single sort pass, with repeat penalties applied in the sampler.
- 05MLX prompt tokenization is moved into request handler goroutines; array management thread safety is improved.
- 06GLM4 MoE Lite gets a performance improvement via a fused sigmoid router head.
- 07Bug fixes cover a stale model picker in the macOS app and structured outputs for Gemma 4 when `think=false`.