Pre-flight prompt inspection tool catches token costs before deployment
Ferhat Atagün introduces `context-lens`, a pre-flight tool that shows exact token counts, context-window position, prompt-caching boundaries, and per-call cost before a prompt is ever sent to the Claude API.
Score breakdown
Measure token counts, window utilization, and per-call cost before committing to a prompt design — not after seeing the bill — by running a pre-flight check with `context-lens`.
- 01Ferhat Atagün introduces `context-lens`, a pre-flight prompt inspection tool for Claude developers.
- 02Anthropic's `/v1/messages/count_tokens` endpoint returns exact `input_tokens` using the same tokenization as a real API call, with no model invocation or output cost.
- 03Sonnet 4.5 has a 200K-token context window; exceeding it causes silent content dropping, not an error.
Ferhat Atagün's post diagnoses a recurring failure mode for teams building on Claude: token costs, context-window saturation, and prompt-caching inefficiencies are all discoverable before deployment, yet most developers only encounter them in production. The root cause, he argues, is that the standard toolchain — chat playgrounds, IDE plugins, the official SDK, and even his own previously shipped tools (`claudoscope`, `agent-replay`, `prompt-lab`, `tool-lab`) — are all retrospective. They analyze what just happened, not what is about to happen.
A fourth dimension — where to place `cache_control` boundaries — is not auto-derived, but `context-lens` visualizes the boundaries the developer has already set for manual review.
`context-lens` is positioned as the pre-flight counterpart. It surfaces three things derivable from the prompt text alone: (1) an exact token count via Anthropic's `/v1/messages/count_tokens` endpoint, which accepts the same `system`, `messages`, and `tools` shape as a real API call but returns only `input_tokens` at negligible cost; (2) context-window position, expressed as `(input + max_output) / 200_000` against Sonnet 4.5's 200K-token window, since exceeding it causes silent content dropping rather than an error; and (3) per-call cost, computed as input tokens × `$3/M` plus output tokens × `$15/M` for Sonnet, scalable by daily traffic volume. A fourth dimension — where to place `cache_control` boundaries — is not auto-derived, but `context-lens` visualizes the boundaries the developer has already set for manual review.
The post's central worked example compares two versions of the same agent system prompt for a code review task. Version A, using markdown headings, embedded JSON schema, and a long taxonomy, clocked in at 3,847 input tokens. Version B, a single paragraph with schema implied by one example, came in at 612 tokens — a 6.3× difference. On five real traffic samples, both versions caught the same critical bugs, produced valid JSON, and stayed under 800 output tokens. The cost differential works out to roughly $0.0235 vs. $0.0138 per call, or about $97/day and $3,000/month at 10,000 daily calls. Atagün notes the shorter prompt was written for readability, not cost — the savings were invisible until `context-lens` surfaced the token counts. The tool provides two modes: a live heuristic (~3.7 chars/token for English text) that updates on every keystroke without an API call, and an on-demand exact count via the `count_tokens` endpoint.
Key facts
- 01Ferhat Atagün introduces `context-lens`, a pre-flight prompt inspection tool for Claude developers.
- 02Anthropic's `/v1/messages/count_tokens` endpoint returns exact `input_tokens` using the same tokenization as a real API call, with no model invocation or output cost.
- 03Sonnet 4.5 has a 200K-token context window; exceeding it causes silent content dropping, not an error.
- 04Input pricing used in the post's cost formula is $3/M tokens for Sonnet; output pricing is $15/M tokens.
- 05Two functionally equivalent system prompts measured 3,847 tokens (Version A) vs. 612 tokens (Version B) — a 6.3× difference.
- 06At 10,000 calls/day, the difference between the two prompts equates to roughly $97/day or $3,000/month.
- 07`context-lens` offers a live ~3.7 chars/token heuristic (no API key needed) and an on-demand exact count via the `count_tokens` endpoint.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 8, 2026 · 15:36 UTC. How this works →