Jun 15, 2026·1 min readApplications & Use Cases

Hybrid agent routes planning to frontier models, execution to local GPUs

u/Poha_Best_Breakfast built a three-tier coding agent that uses a frontier model (Codex) only for high-level planning while routing ~85–90% of tokens to local models on a dual RTX 3090 rig.

r/LocalLLaMA·u/Poha_Best_Breakfast

Read at source

Composite

5.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The architecture shows a concrete approach to dramatically reducing frontier model token spend — keeping ~85–90% of tokens local — without sacrificing high-level design quality, by reserving the frontier model exclusively for task decomposition and using deterministic validation to keep long-running agentic chains on track.

01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
02All three tiers are swappable via a config file
03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement

Summary— our read of the original

u/Poha_Best_Breakfast describes a personal coding agent built over several months to make better use of a dual RTX 3090 system. The motivation was that local models like Qwen 3.5/3.6 27B and Gemma 4 31B, while capable, lack the "taste" of frontier models for high-level design decisions — but running everything through a frontier model is expensive. The solution is a three-tier architecture where each tier is fully swappable via a config file: a Planner tier (currently Codex) that decomposes any incoming task into N phases, a Local tier (currently Qwen 3.6 27B) that executes the bulk of work, and an optional Senior tier (currently Kimi K2.6 via `opencode-go`) that steps in when local retries are exhausted.

A coding task, for example, might be decomposed into research, implement, and review phases; each phase can run for multiple epochs, with each epoch generating discrete tasks handled by the local model.

A coding task, for example, might be decomposed into research, implement, and review phases; each phase can run for multiple epochs, with each epoch generating discrete tasks handled by the local model. By the author's measurement, approximately 85–90% of total tokens and ~95% of output tokens flow through local models. Context isolation between phases is a deliberate design choice to prevent "context rot" and keep the frontier model's context window from overflowing with bash call noise.

The other major differentiator is a deterministic state machine for validation: a task is only considered complete when a real check actually passes — a command exits 0 or a required file exists — rather than trusting the model's self-reported progress. This allows multi-hour agentic chains to run without drifting. Additional built-in features include a repomapper that represents the repository as a graph and aggressive context curation to avoid flooding local models with irrelevant files. The project is self-described as WIP, with a messy installation process, no GUI, and a requirement to author a `job.md` file to define work.

Key facts

01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
02All three tiers are swappable via a config file
03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement
04Deterministic validation: a task is only marked done when a command exits 0 or an expected file exists — the state machine re-runs checks itself
05Context isolation between phases prevents context rot and keeps frontier model costs down
06Built-in repomapper represents the repo as a graph; context is aggressively curated for local models
07Still WIP: installation is rough, no UI beyond a shell command with a simple TUI, and requires a hand-authored `job.md`

Topics

#agent-framework #local-llm #hybrid-inference #coding-agent #open-source

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 15, 2026 · 11:57 UTC. How this works →

Jun 15, 2026·1 min readApplications & Use Cases

Hybrid agent routes planning to frontier models, execution to local GPUs

u/Poha_Best_Breakfast built a three-tier coding agent that uses a frontier model (Codex) only for high-level planning while routing ~85–90% of tokens to local models on a dual RTX 3090 rig.

r/LocalLLaMA·u/Poha_Best_Breakfast

Read at source

Composite

5.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
02All three tiers are swappable via a config file
03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement

Summary— our read of the original

A coding task, for example, might be decomposed into research, implement, and review phases; each phase can run for multiple epochs, with each epoch generating discrete tasks handled by the local model.

Key facts

01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
02All three tiers are swappable via a config file
03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement
04Deterministic validation: a task is only marked done when a command exits 0 or an expected file exists — the state machine re-runs checks itself
05Context isolation between phases prevents context rot and keeps frontier model costs down
06Built-in repomapper represents the repo as a graph; context is aggressively curated for local models
07Still WIP: installation is rough, no UI beyond a shell command with a simple TUI, and requires a hand-authored `job.md`

Topics

#agent-framework #local-llm #hybrid-inference #coding-agent #open-source

Methodology

Score breakdown

Key facts

Topics

More in Applications & Use Cases.

Score breakdown

Key facts

Topics

More in Applications & Use Cases.