Opus 4.8 removes `budget_tokens` and adds fast-throughput mode
Anthropic shipped Claude Opus 4.8 on May 28, 2026, dropping `budget_tokens` (now returns a 400) and adding a `speed: "fast"` mode, mid-session system messages, a lower prompt-cache floor, and documented refusal `stop_details`.
Score breakdown
The removal of `budget_tokens` is a hard breaking change that requires code updates before migrating from Opus 4.7 to 4.8, while the new `speed: "fast"` mode and mid-session system messages extend what agents can do within a single session.
- 01Anthropic released Claude Opus 4.8 on May 28, 2026, 41 days after Opus 4.7.
- 02Breaking change: `budget_tokens` is removed — passing it now returns a 400 error.
- 03New `speed: "fast"` mode offers up to 2.5× throughput at $10/$50 per million tokens (doubled cost); it is a gated research preview requiring console enrollment.
Anthropic shipped Claude Opus 4.8 on May 28, 2026, just 41 days after Opus 4.7 — a compressed release cycle the post attributes partly to competitive pressure from OpenAI Codex and Google Gemini Flash. The single breaking change is the removal of `budget_tokens`: any request that passes it now returns a 400 error. The replacement pattern is `thinking={"type": "adaptive"}` with effort controlled via a separate `output_config={"effort": "..."}` parameter accepting values `"low"`, `"high"` (default), and `"max"`. Beyond that, the post describes the migration as a simple model-ID swap — all other Opus 4.7 request structure carries over unchanged.
Mid-session `role: "system"` messages can now be inserted directly into the messages array after any user turn with no beta header required, and earlier turns remain cached so only the injected delta incurs cost.
Four additions round out the release. The `speed: "fast"` mode delivers up to 2.5× throughput but doubles per-token cost to $10 input / $50 output per million tokens; it is a gated research preview requiring prior enrollment in the Anthropic console, and unenrolled requests return an error. Mid-session `role: "system"` messages can now be inserted directly into the messages array after any user turn with no beta header required, and earlier turns remain cached so only the injected delta incurs cost. The prompt-cache minimum drops from ~2,000 tokens to 1,024 tokens with no code changes needed. Refusal `stop_details` were previously present but undocumented; they are now publicly documented and categorize refusal type, enabling branching on `response.stop_details.type` instead of parsing free-text content.
The post also flags several platform and access constraints. The Claude API, Amazon Bedrock, and Vertex AI each provide a 1 million token context window, while Microsoft Foundry caps at 200k tokens. Maximum synchronous output is 128k tokens, raised to 300k tokens per call via the Message Batches API with the beta header `output-300k-2026-03-24`. The model's knowledge cutoff is January 2026, and the post specifies `anthropic>=0.51` as the minimum SDK version. Fast mode and Claude Code access both require a paid Anthropic subscription (Pro, Max, Teams, or Enterprise) — the free tier does not include either.
Key facts
- 01Anthropic released Claude Opus 4.8 on May 28, 2026, 41 days after Opus 4.7.
- 02Breaking change: `budget_tokens` is removed — passing it now returns a 400 error.
- 03New `speed: "fast"` mode offers up to 2.5× throughput at $10/$50 per million tokens (doubled cost); it is a gated research preview requiring console enrollment.
- 04Mid-session `role: "system"` messages are now supported in the messages array with no beta header required.
- 05Prompt-cache minimum drops from ~2,000 tokens to 1,024 tokens.
- 06Refusal `stop_details` are now publicly documented, enabling branching by refusal category.
- 07Standard pricing ($5/$25 per million tokens), context window (1M tokens on Claude API, Bedrock, Vertex), and all other 4.7 request structure remain unchanged; minimum SDK is `anthropic>=0.51`.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →