★ Rank 09 today·NEW·Jun 17, 2026·1 min readNew Models & Releases

Lemonade v10.8 adds auto memory management and MCP gateway for local models

Lemonade v10.8 ships dynamic VRAM management, a provider-agnostic cloud offload backend, expanded LMX-Omni image generation controls, and an MCP gateway that exposes local models as callable tools.

r/LocalLLaMA·u/jfowers_amd

Read at source

Composite · rank 09

6.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The MCP gateway turns a local Lemonade server into a set of callable tools for any MCP-aware host, removing the need to route those requests to a cloud API.

01v10.8 was a 20-contributor release shipped in 7 days
02Dynamic VRAM management auto-unloads idle models and downsizes KV-cache to reclaim GPU memory on the fly
03Model pinning prevents chosen models from being evicted from GPU memory

Summary— our read of the original

Lemonade v10.8 lands a set of memory and context management improvements designed to reduce manual tuning. Dynamic VRAM management now auto-unloads idle models and downsizes their KV-cache to reclaim GPU memory on the fly, while a model-pinning feature ensures frequently used models are never evicted. Automatic context sizing removes the need to hand-tune context length by deriving it from available memory and the model's architecture.

LMX-Omni gains expanded image generation controls (size, steps, etc.) and the ability to pull and share custom omni models from Hugging Face.

A new provider-agnostic cloud offload backend allows chat completions to be served from any OpenAI-compatible provider — Fireworks, OpenRouter, Together, or OpenAI — right alongside local models, with switching available from the CLI or UI. The post describes the design philosophy as "local-first, with cloud as an option, not a default," with a stated goal of eventually enabling applications to route between client and cloud based on their own routing policies. LMX-Omni gains expanded image generation controls (size, steps, etc.) and the ability to pull and share custom omni models from Hugging Face.

The MCP gateway (`POST /mcp`) exposes five tools — model listing, chat, audio transcription, image generation, and multimodal omni — allowing any MCP-aware host to call local Lemonade models as tools instead of reaching for a cloud API. Platform expansion in this release covers NVIDIA GB10 (Blackwell) arm64 CUDA, TheRock ROCm on Windows for Radeon RX GPUs, ROCm for Radeon 840M/860M iGPUs, `whisper.cpp` moved to ROCm on Windows and Linux, a dedicated Debian 13 build, and a CDNA datacenter GPU detection fix.

Key facts

01v10.8 was a 20-contributor release shipped in 7 days
02Dynamic VRAM management auto-unloads idle models and downsizes KV-cache to reclaim GPU memory on the fly
03Model pinning prevents chosen models from being evicted from GPU memory
04Automatic context sizing derives context length from available memory and model architecture, removing manual tuning
05A provider-agnostic cloud offload backend supports Fireworks, OpenRouter, Together, and OpenAI alongside local models
06An MCP gateway (POST /mcp) exposes five tools: model listing, chat, audio transcription, image generation, and multimodal omni
07Platform support added for NVIDIA GB10 (Blackwell) arm64 CUDA, TheRock ROCm on Windows, and ROCm for Radeon 840M/860M iGPUs

Topics

#local-llms #mcp #tool-use #open-source #model-release

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →

★ Rank 09 today·NEW·Jun 17, 2026·1 min readNew Models & Releases

Lemonade v10.8 adds auto memory management and MCP gateway for local models

Lemonade v10.8 ships dynamic VRAM management, a provider-agnostic cloud offload backend, expanded LMX-Omni image generation controls, and an MCP gateway that exposes local models as callable tools.

r/LocalLLaMA·u/jfowers_amd

Read at source

Composite · rank 09

6.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The MCP gateway turns a local Lemonade server into a set of callable tools for any MCP-aware host, removing the need to route those requests to a cloud API.

01v10.8 was a 20-contributor release shipped in 7 days
02Dynamic VRAM management auto-unloads idle models and downsizes KV-cache to reclaim GPU memory on the fly
03Model pinning prevents chosen models from being evicted from GPU memory

Summary— our read of the original

LMX-Omni gains expanded image generation controls (size, steps, etc.) and the ability to pull and share custom omni models from Hugging Face.

Key facts

01v10.8 was a 20-contributor release shipped in 7 days
02Dynamic VRAM management auto-unloads idle models and downsizes KV-cache to reclaim GPU memory on the fly
03Model pinning prevents chosen models from being evicted from GPU memory
04Automatic context sizing derives context length from available memory and model architecture, removing manual tuning
05A provider-agnostic cloud offload backend supports Fireworks, OpenRouter, Together, and OpenAI alongside local models
06An MCP gateway (POST /mcp) exposes five tools: model listing, chat, audio transcription, image generation, and multimodal omni
07Platform support added for NVIDIA GB10 (Blackwell) arm64 CUDA, TheRock ROCm on Windows, and ROCm for Radeon 840M/860M iGPUs

Topics

#local-llms #mcp #tool-use #open-source #model-release

Methodology

Score breakdown

Key facts

Topics

More in New Models & Releases.

Score breakdown

Key facts

Topics

More in New Models & Releases.