Apr 21, 2026·1 min readAgentic Coding

LiteCode brings agentic coding to 8k-context local models

razvanneculai built LiteCode, a coding agent that works within 8k-context local LLMs using a three-step Map-Plan-Execute pipeline and a ring-buffer memory system to stay within tight token limits.

Hacker News·razvanneculai

Read at source

Composite

5.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers running small local models can now use a structured coding agent without needing a large context window, making agentic workflows accessible on consumer hardware.

01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.

Summary— our read of the original

LiteCode, built by razvanneculai, addresses a gap in the agentic coding space: most coding agents are designed around 200k-context models, but the local models most developers actually run have 8k context windows — barely enough for a single large file. The tool's Map-Plan-Execute pipeline works around this constraint without requiring a larger model. On initialization, the Map step writes plain Markdown context files — one project-level overview, one per folder, and a line-range index for any file exceeding 150 lines. The Plan step then makes a single LLM call to read the map and produce a dependency-aware task list. During Execute, each LLM call receives only one file at a time; a token counter checks before every call and falls back to loading just the relevant line range if the file is too large.

Local models run sequentially by default, while cloud providers run in parallel.

The hardest engineering challenge was conversation memory. Compression alone was insufficient for an 8k window, so razvanneculai implemented a ring-buffer eviction system that keeps summaries of the last two completed actions — providing enough continuity to avoid repeated work while remaining cheap enough to always fit within the context limit. The tool supports a broad range of backends including Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, and any OpenAI-compatible endpoint. Local models run sequentially by default, while cloud providers run in parallel.

Key facts

01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.
04Execute step passes only one file per LLM call, with a token counter that falls back to the relevant line range if the file is too large.
05Conversation memory is handled via a ring-buffer eviction system storing summaries of the last two completed actions.
06Supports Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, and any OpenAI-compatible endpoint.
07Local models run sequentially; cloud providers run in parallel.

Topics

#coding-agent #open-source #local-models #context-management #tool-use

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 19:13 UTC. How this works →

Apr 21, 2026·1 min readAgentic Coding

LiteCode brings agentic coding to 8k-context local models

razvanneculai built LiteCode, a coding agent that works within 8k-context local LLMs using a three-step Map-Plan-Execute pipeline and a ring-buffer memory system to stay within tight token limits.

Hacker News·razvanneculai

Read at source

Composite

5.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers running small local models can now use a structured coding agent without needing a large context window, making agentic workflows accessible on consumer hardware.

01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.

Summary— our read of the original

Local models run sequentially by default, while cloud providers run in parallel.

Key facts

01Targets local LLMs with 8k context windows, contrasting with agents built for 200k-context models.
02Map step generates Markdown context files: one project overview, one per folder, and a line-range index for files over 150 lines.
03Plan step uses a single LLM call to convert a user request into a dependency-aware task list.
04Execute step passes only one file per LLM call, with a token counter that falls back to the relevant line range if the file is too large.
05Conversation memory is handled via a ring-buffer eviction system storing summaries of the last two completed actions.
06Supports Ollama, LM Studio, Groq, OpenRouter, Gemini, DeepSeek, and any OpenAI-compatible endpoint.
07Local models run sequentially; cloud providers run in parallel.

Topics

#coding-agent #open-source #local-models #context-management #tool-use

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics