Apr 22, 2026·1 min readApplications & Use Cases

Semantic memory MCP server cures Claude Code's context bloat

Kunal Jaiswal built a 767-line Python MCP memory server that replaces flat `.md` documentation files with a semantic knowledge base, eliminating the context-window drain that was degrading Claude Code's performance across a 30+ service home automation stack.

Dev.to #mcp·Kunal Jaiswal

Read at source

Composite

4.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers managing large, multi-service codebases with Claude Code can adopt this MCP-based semantic memory pattern to dramatically reduce context-window overhead and prevent the model from re-exploring already-documented knowledge.

01The home automation codebase grew to 30+ services across 5 machines, each with its own .md documentation file averaging 200–400 lines.
02Loading 10 documentation files for a cross-system task consumed 3,000–4,000 tokens before Claude wrote any code.
03The MCP memory server is 767 lines of Python, exposing 6 tools: `memory_search`, `memory_save`, `memory_list`, `memory_update`, `memory_delete`, and `memory_stats`.

Summary— our read of the original

Kunal Jaiswal's home automation stack spans five machines running 30+ services — camera monitors, WhatsApp agents, LLM inference pipelines, job scrapers, and diet trackers. He maintained one `.md` file per service, each averaging 200–400 lines and containing architecture decisions, port numbers, credentials, and bug histories. At five files the system worked well; at thirty, Claude Code began re-exploring already-documented code, recommending ports already in use, and missing cross-service dependencies. The root cause was context-window exhaustion: loading ten documentation files for a cross-system task consumed 3,000–4,000 tokens before Claude wrote a single line of code, and Claude sometimes read five files before finding an answer in the sixth.

His solution is a 767-line Python server that exposes both an MCP/SSE interface (for Claude Code) and a REST interface (for other agents) on port 8042.

Jaiswal argues that standard RAG over documentation files fails because the retrieval unit is the wrong size — a full `.md` file wastes context with irrelevant sections, while chunking it breaks structural relationships between sections. His solution is a 767-line Python server that exposes both an MCP/SSE interface (for Claude Code) and a REST interface (for other agents) on port 8042. The server uses `sentence-transformers` with the `all-MiniLM-L6-v2` model to generate 384-dimensional embeddings, compresses them to 4-bit with TurboQuant for a smaller in-memory footprint, and stores content, tags, category, agent ID, and timestamps in MySQL. Search is cosine similarity over the compressed vectors.

The critical behavioral change came from a "Memory-First Rule" added to `CLAUDE.md`, mandating that Claude always call `memory_search` before reading any file, exploring code, or launching an Explore agent, with local `.md` files as a fallback only when the memory server is unreachable. Jaiswal imported his 30 existing documentation files by splitting them on `##` headers, converting each section into a self-contained memory entry — 30 files became 200+ discrete facts, each tagged with its source file and category. The result is that a lookup that previously required reading multiple files now completes in under 100ms via a single semantic search, keeping the context window free for actual coding work.

Key facts

01The home automation codebase grew to 30+ services across 5 machines, each with its own .md documentation file averaging 200–400 lines.
02Loading 10 documentation files for a cross-system task consumed 3,000–4,000 tokens before Claude wrote any code.
03The MCP memory server is 767 lines of Python, exposing 6 tools: `memory_search`, `memory_save`, `memory_list`, `memory_update`, `memory_delete`, and `memory_stats`.
04Embeddings use `sentence-transformers` with the `all-MiniLM-L6-v2` model (384 dimensions), compressed to 4-bit with TurboQuant.
05Metadata — content, tags, category, agent ID, timestamps — is stored in MySQL; the vector index is held in memory and persisted to disk.
06An import script split 30 .md files on ## headers, converting them into 200+ self-contained memory entries.
07A 'Memory-First Rule' in CLAUDE.md mandates calling `memory_search` before any file read or agent launch, with results returning in under 100ms.

Topics

#mcp #claude-code #semantic-memory #rag #developer-tools

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 19:13 UTC. How this works →

Semantic memory MCP server cures Claude Code's context bloat

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics