LiteCode brings agentic coding to 8k-context local models
razvanneculai built LiteCode, a coding agent that works within 8k-context local LLMs using a three-step Map-Plan-Execute pipeline and a ring-buffer memory system to stay within tight token limits.
Score breakdown
Developers running small local models can now use a structured coding agent without needing a large context window, making agentic workflows accessible on consumer hardware.
- 01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
- 02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
- 03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.
LiteCode, built by razvanneculai, addresses a gap in the agentic coding space: most coding agents are designed around 200k-context models, but the local models most developers actually run have 8k context windows — barely enough for a single large file. The tool's Map-Plan-Execute pipeline works around this constraint without requiring a larger model. On initialization, the Map step writes plain Markdown context files — one project-level overview, one per folder, and a line-range index for any file exceeding 150 lines. The Plan step then makes a single LLM call to read the map and produce a dependency-aware task list. During Execute, each LLM call receives only one file at a time; a token counter checks before every call and falls back to loading just the relevant line range if the file is too large.
Local models run sequentially by default, while cloud providers run in parallel.
The hardest engineering challenge was conversation memory. Compression alone was insufficient for an 8k window, so razvanneculai implemented a ring-buffer eviction system that keeps summaries of the last two completed actions — providing enough continuity to avoid repeated work while remaining cheap enough to always fit within the context limit. The tool supports a broad range of backends including Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, and any OpenAI-compatible endpoint. Local models run sequentially by default, while cloud providers run in parallel.
Key facts
- 01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
- 02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
- 03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.
- 04Execute step passes only one file per LLM call, with a token counter that falls back to the relevant line range if the file is too large.
- 05Conversation memory is handled via a ring-buffer eviction system storing summaries of the last two completed actions.
- 06Supports Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, and any OpenAI-compatible endpoint.
- 07Local models run sequentially; cloud providers run in parallel.