Rayline routes Claude Code subagents to cheaper and on-device models
Rayline is a Claude Code-compatible LLM gateway that intercepts internal routing to direct subagent calls to different models — including cloud-hosted open models and on-device models — rather than running everything through the same expensive model.
Score breakdown
Teams running Claude Code at scale can cut session costs significantly by routing low-complexity subagent calls away from frontier models without changing their existing Claude Code workflow.
- 01Rayline is a Claude Code-compatible LLM gateway that intercepts and overrides Claude Code's internal model routing.
- 02Users can route the main agent to Opus while sending subagent calls to cloud-hosted open models or on-device models.
- 03Routing is implemented at the gateway level, not as agent-invokable tools, to avoid token overhead from LLM-based routing decisions.
Rayline is a Claude Code-compatible LLM gateway built to address the cost structure of agentic coding sessions, which the builders observed contain many subagent calls with widely varying capability requirements. Rather than running all calls through the same model, Rayline intercepts Claude Code's internal routing and lets users configure which model handles which type of subagent call — the main agent can run on Opus, while narrower delegated tasks route to cloud-hosted open models or on-device models.
The team explicitly chose a gateway architecture over tool-based routing, which they found inefficient because it requires the main agent to spend tokens reasoning about and invoking routing tools.
The team explicitly chose a gateway architecture over tool-based routing, which they found inefficient because it requires the main agent to spend tokens reasoning about and invoking routing tools. Rayline instead supports deterministic user-configured routing and an optional ML model for routing decisions. The builders note that subagent delegations are a natural cache-friendly routing boundary, since switching models at that point avoids busting the cached input context. In their private beta, they report cost reductions of 60–90%, attributing this to the observation that open models offer better capability-per-dollar than Sonnet or Haiku for many delegated tasks such as repo searches, error inspection, context summarization, and CI polling.
Key facts
- 01Rayline is a Claude Code-compatible LLM gateway that intercepts and overrides Claude Code's internal model routing.
- 02Users can route the main agent to Opus while sending subagent calls to cloud-hosted open models or on-device models.
- 03Routing is implemented at the gateway level, not as agent-invokable tools, to avoid token overhead from LLM-based routing decisions.
- 04Users can configure routing deterministically or use Rayline's optional ML model to make routing decisions.
- 05Subagent delegations are used as the routing boundary because switching models there avoids busting the prompt cache.
- 06The builders report 60–90% cost savings in their private beta.
- 07Narrow-scope subagent tasks cited include repo search, context summarization, error inspection, and CI polling.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 09:19 UTC. How this works →