Hybrid agent routes planning to frontier models, execution to local GPUs
u/Poha_Best_Breakfast built a three-tier coding agent that uses a frontier model (Codex) only for high-level planning while routing ~85–90% of tokens to local models on a dual RTX 3090 rig.
Score breakdown
The architecture shows a concrete approach to dramatically reducing frontier model token spend — keeping ~85–90% of tokens local — without sacrificing high-level design quality, by reserving the frontier model exclusively for task decomposition and using deterministic validation to keep long-running agentic chains on track.
- 01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
- 02All three tiers are swappable via a config file
- 03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement
u/Poha_Best_Breakfast describes a personal coding agent built over several months to make better use of a dual RTX 3090 system. The motivation was that local models like Qwen 3.5/3.6 27B and Gemma 4 31B, while capable, lack the "taste" of frontier models for high-level design decisions — but running everything through a frontier model is expensive. The solution is a three-tier architecture where each tier is fully swappable via a config file: a Planner tier (currently Codex) that decomposes any incoming task into N phases, a Local tier (currently Qwen 3.6 27B) that executes the bulk of work, and an optional Senior tier (currently Kimi K2.6 via `opencode-go`) that steps in when local retries are exhausted.
A coding task, for example, might be decomposed into research, implement, and review phases; each phase can run for multiple epochs, with each epoch generating discrete tasks handled by the local model.
A coding task, for example, might be decomposed into research, implement, and review phases; each phase can run for multiple epochs, with each epoch generating discrete tasks handled by the local model. By the author's measurement, approximately 85–90% of total tokens and ~95% of output tokens flow through local models. Context isolation between phases is a deliberate design choice to prevent "context rot" and keep the frontier model's context window from overflowing with bash call noise.
The other major differentiator is a deterministic state machine for validation: a task is only considered complete when a real check actually passes — a command exits 0 or a required file exists — rather than trusting the model's self-reported progress. This allows multi-hour agentic chains to run without drifting. Additional built-in features include a repomapper that represents the repository as a graph and aggressive context curation to avoid flooding local models with irrelevant files. The project is self-described as WIP, with a messy installation process, no GUI, and a requirement to author a `job.md` file to define work.
Key facts
- 01Three-tier architecture: Codex (planner), Qwen 3.6 27B (local executor), Kimi K2.6 via `opencode-go` (optional senior fallback)
- 02All three tiers are swappable via a config file
- 03~85–90% of total tokens and ~95% of output tokens flow through local models, by the author's measurement
- 04Deterministic validation: a task is only marked done when a command exits 0 or an expected file exists — the state machine re-runs checks itself
- 05Context isolation between phases prevents context rot and keeps frontier model costs down
- 06Built-in repomapper represents the repo as a graph; context is aggressively curated for local models
- 07Still WIP: installation is rough, no UI beyond a shell command with a simple TUI, and requires a hand-authored `job.md`
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 15, 2026 · 11:57 UTC. How this works →