Apr 20, 2026·1 min readApplications & Use Cases

CyberWriter MD editor showcases Apple's on-device AI stack

Developer uncSoft built CyberWriter, a privacy-first Markdown editor that wires together three of Apple's on-device AI primitives — a ~3B-parameter Foundation Model, a BERT-style text embedder, and on-device speech recognition — to deliver local RAG, inline AI editing, and voice dictation without any cloud calls.

Hacker News·uncSoft

Read at source

Composite

6.4

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building AI coding or writing tools on macOS can now replicate local RAG, inline AI editing, and voice dictation without any API costs or cloud dependencies by wiring together Apple's Foundation Models, `NLContextualEmbedding`, and `SFSpeechRecognizer` — a stack CyberWriter demonstrates is already production-usable.

01CyberWriter is a Markdown editor built by uncSoft on three Apple on-device AI APIs: Foundation Models, NLContextualEmbedding, and SFSpeechRecognizer/SpeechAnalyzer.
02Foundation Models (macOS 26) exposes a ~3B-parameter LLM with streaming, structured output, and tool use — no API key, no cloud call, no per-token cost.
03NLContextualEmbedding is a 512-dim BERT-style embedder available since macOS 14 / iOS 17, used for local semantic vault search.

Summary— our read of the original

uncSoft built CyberWriter as both a practical tool and a proof-of-concept for Apple's on-device AI stack, which gained its first developer API access in macOS 26. The app layers three Apple primitives: the Foundation Models framework, which exposes a ~3B-parameter LLM supporting streaming, structured output, and tool use with no API key or cloud dependency; `NLContextualEmbedding` from the Natural Language framework (available since macOS 14 / iOS 17), a 512-dim BERT-style embedder comparable to what OpenAI and Cohere sell as paid services; and `SFSpeechRecognizer`/`SpeechAnalyzer` for live on-device dictation with solid accuracy on Apple Silicon.

The vault chat feature indexes a Markdown folder using `NLContextualEmbedding` — roughly 50 seconds for 1,000 chunks on an M1 — and stores vectors in a local `.vault.embeddings.json` file.

The vault chat feature indexes a Markdown folder using `NLContextualEmbedding` — roughly 50 seconds for 1,000 chunks on an M1 — and stores vectors in a local `.vault.embeddings.json` file. Semantic search surfaces conceptually related notes even without keyword overlap, and AI queries retrieve the top 5 chunks as context in a plain RAG pipeline. An AI Workspace panel (`Command+Shift+A`) and inline quick actions (`Command+J`) feed the same context layer — document selection, attached files, and vault chunks — to whichever provider is active, including Apple Intelligence, Claude, OpenAI, Ollama, or LM Studio.

The post is candid about limitations: the 512-dim embeddings are described as "solid mid-tier," 256-token chunks can split arguments mid-paragraph, the Foundation Models context window caps around 6K characters (budgeted to 3K for vault context), and multilingual support is currently English-only. uncSoft notes the developer experience for these APIs is "genuinely good" and expresses surprise that the embedding feature works out of the box, framing the post as an invitation for others to explore what they're building on the same stack.

Key facts

01CyberWriter is a Markdown editor built by uncSoft on three Apple on-device AI APIs: Foundation Models, NLContextualEmbedding, and SFSpeechRecognizer/SpeechAnalyzer.
02Foundation Models (macOS 26) exposes a ~3B-parameter LLM with streaming, structured output, and tool use — no API key, no cloud call, no per-token cost.
03NLContextualEmbedding is a 512-dim BERT-style embedder available since macOS 14 / iOS 17, used for local semantic vault search.
04Vault indexing runs at roughly 50 seconds per 1,000 chunks on an M1; vectors are stored in a local `.vault.embeddings.json` file and never sent anywhere.
05The Foundation Models context window caps at around 6K characters; vault context is budgeted to 3K with truncation markers.
06Cloud providers (Claude, OpenAI, Ollama, LM Studio) are supported alongside Apple Intelligence, with an explicit toggle and inline warning before any data leaves the device.
07Current limitations include English-only support, 256-token chunk splitting, and 512-dim embeddings described as 'solid mid-tier' compared to GPT-4-class embedders.

Topics

#on-device-ai #rag #embeddings #tool-use #privacy

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 20, 2026 · 13:29 UTC. How this works →

CyberWriter MD editor showcases Apple's on-device AI stack

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics