Apr 22, 2026·1 min readAgent Frameworks & Tools

Ollama v0.21.1 adds Kimi CLI and MLX improvements

Ollama `v0.21.1` introduces Kimi CLI support for long-horizon agentic tasks, alongside multiple MLX runner performance improvements and several bug fixes.

GitHub: ollama/ollama·github-actions[bot]

Read at source

Composite

4.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building long-horizon agentic pipelines can now launch Kimi K2.6's multi-agent system directly from Ollama, while MLX users benefit from faster sampling and tokenization without any configuration changes.

01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
03MLX runner gains logprobs support for compatible models.

Summary— our read of the original

Ollama `v0.21.1` introduces Kimi CLI as a launchable tool within Ollama, invoked via `ollama launch kimi --model kimi-k2.6:cloud`. According to the release notes, Kimi CLI paired with Kimi K2.6 is designed to excel at long-horizon agentic execution tasks through a multi-agent system, making it a notable addition for practitioners building or running extended AI agent workflows.

Additionally, GLM4 MoE Lite sees a performance gain through a fused sigmoid router head.

On the performance side, the MLX runner receives several targeted improvements: logprobs support is added for compatible models, sampling is accelerated by fusing top-P and top-K into a single sort pass with repeat penalties now applied directly in the sampler, and prompt tokenization is moved into request handler goroutines for better throughput. Thread safety for array management in MLX is also improved. Additionally, GLM4 MoE Lite sees a performance gain through a fused sigmoid router head.

Two bugs are resolved in this release: the macOS app's model picker no longer shows a stale model after switching chats, and structured outputs for Gemma 4 are fixed when `think=false`.

Key facts

01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
03MLX runner gains logprobs support for compatible models.
04MLX sampling is faster via fused top-P and top-K in a single sort pass, with repeat penalties applied in the sampler.
05MLX prompt tokenization is moved into request handler goroutines; array management thread safety is improved.
06GLM4 MoE Lite gets a performance improvement via a fused sigmoid router head.
07Bug fixes cover a stale model picker in the macOS app and structured outputs for Gemma 4 when `think=false`.

Topics

#agent-framework #open-source #model-serving #performance-optimization #multi-agent

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 19:13 UTC. How this works →

Apr 22, 2026·1 min readAgent Frameworks & Tools

Ollama v0.21.1 adds Kimi CLI and MLX improvements

Ollama `v0.21.1` introduces Kimi CLI support for long-horizon agentic tasks, alongside multiple MLX runner performance improvements and several bug fixes.

GitHub: ollama/ollama·github-actions[bot]

Read at source

Composite

4.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
03MLX runner gains logprobs support for compatible models.

Summary— our read of the original

Additionally, GLM4 MoE Lite sees a performance gain through a fused sigmoid router head.

Two bugs are resolved in this release: the macOS app's model picker no longer shows a stale model after switching chats, and structured outputs for Gemma 4 are fixed when `think=false`.

Key facts

01Ollama `v0.21.1` adds Kimi CLI, launchable via `ollama launch kimi --model kimi-k2.6:cloud`.
02Kimi CLI with Kimi K2.6 targets long-horizon agentic execution tasks through a multi-agent system.
03MLX runner gains logprobs support for compatible models.
04MLX sampling is faster via fused top-P and top-K in a single sort pass, with repeat penalties applied in the sampler.
05MLX prompt tokenization is moved into request handler goroutines; array management thread safety is improved.
06GLM4 MoE Lite gets a performance improvement via a fused sigmoid router head.
07Bug fixes cover a stale model picker in the macOS app and structured outputs for Gemma 4 when `think=false`.

Topics

#agent-framework #open-source #model-serving #performance-optimization #multi-agent

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics