Claude Fable 5 edges Opus 4.8 on quality but costs 69% more per task
A head-to-head evaluation of Claude Fable 5 (the public Mythos-class model) against Opus 4.8 across ~1,000 agent scenarios finds Fable 5 scores marginally higher but delivers far worse value at $1.25 vs. $0.74 per task.
Score breakdown
The evaluation shows that Fable 5's marginal quality lead over Opus 4.8 comes at nearly double the per-task cost, making Opus 4.8 the higher-value choice for production agent fleets despite Fable 5 representing a new capability class.
- 01Claude Fable 5 launched publicly on June 9, 2026 as the first public Mythos-class model from Anthropic.
- 02Fable 5 and Mythos 5 share the same underlying model; Fable 5 adds safety classifiers while Mythos 5 (restricted to approved partners) runs without them.
- 03Across 917 shared scenarios, Fable 5 scored 92.9 overall vs. Opus 4.8's 92.0 — a 0.9-point gap.
On June 9, 2026, Anthropic publicly released Claude Fable 5, the first public version of its Mythos-class model tier — a capability class the company had previously withheld due to safety concerns, particularly around discovering and exploiting software vulnerabilities. Fable 5 and the restricted Mythos 5 share the same underlying model; the difference is that Fable 5 ships with safety classifiers, while Mythos 5, available only to approved partners, runs without them. Before launch, community speculation framed Mythos as a transformative leap: a model capable of restructuring large codebases in one pass, spotting security flaws experienced engineers miss, working unsupervised for hours, and acting as a true collaborator rather than a steerable assistant. Anthropic's CPO Mike Krieger called it "the most capable class of systems we've built," and through Project Glasswing, roughly 50 early partners reported finding more than 10,000 high or critical severity vulnerabilities, with the program since expanding past 150 organizations.
The two models tie on 61% of tasks at a two-point threshold, with Fable 5 winning 24% and Opus 4.8 winning 16%.
To test whether Fable 5 lives up to that billing, Nicolas Fortuin and Baptiste Fernandez at Tessl evaluated both models on 917 shared scenarios drawn from the public `task-evals-for-skills` dataset, scoring each on instruction-following (weighted 4) and task-completion (weighted 3), both with and without a relevant skill in context. Fable 5 leads on overall score by 0.9 points (92.9 vs. 92.0), with instruction-following at 89.3 vs. 88.0 and task-completion at 97.8 vs. 97.4. The two models tie on 61% of tasks at a two-point threshold, with Fable 5 winning 24% and Opus 4.8 winning 16%. Skill lift is nearly identical: +17.2 for Fable 5 vs. +17.5 for Opus 4.8. Where the models diverge sharply is cost: Fable 5 is priced at $10/$50 per MTok (input/output) versus Opus 4.8's $5/$25, and uses fewer output tokens per task (9,025 vs. 10,687), but still averages $1.25 per task against Opus 4.8's $0.74 — yielding 74 points per dollar versus 125. The article's conclusion, as of mid-2026, is that Opus 4.8 remains the better value for most agent fleets, and the distance between the Mythos hype and the measured Fable 5 reality is the central finding.
Key facts
- 01Claude Fable 5 launched publicly on June 9, 2026 as the first public Mythos-class model from Anthropic.
- 02Fable 5 and Mythos 5 share the same underlying model; Fable 5 adds safety classifiers while Mythos 5 (restricted to approved partners) runs without them.
- 03Across 917 shared scenarios, Fable 5 scored 92.9 overall vs. Opus 4.8's 92.0 — a 0.9-point gap.
- 04The two models tied on 61% of tasks; Fable 5 won 24% and Opus 4.8 won 16% at a two-point threshold.
- 05Fable 5 averages $1.25 per task vs. $0.74 for Opus 4.8, yielding 74 points per dollar vs. 125.
- 06Fable 5 is priced at $10/$50 per MTok (input/output); Opus 4.8 is $5/$25 per MTok.
- 07Through Project Glasswing, roughly 50 early Mythos Preview partners reported finding more than 10,000 high or critical severity vulnerabilities; the program has since expanded past 150 organizations.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 13, 2026 · 08:58 UTC. How this works →