Anthropic's Claude Fable 5 scores 91/100 on senior engineer benchmark
@danshipper reports that Anthropic released Claude Fable 5 (internally called Mythos), which scored 91/100 on Every's Senior Engineer benchmark — far ahead of Opus 4.8 at 63 and GPT-5.5 at 62 — but is very slow, token-hungry, and about twice as expensive as Opus.
Score breakdown
The post's benchmark results place Claude Fable 5 well above both Opus 4.8 and GPT-5.5 on Every's Senior Engineer benchmark, while the token consumption and cost profile described mark it as a specialized tool for heavy, long-horizon coding workloads rather than a general-purpose upgrade.
- 01Claude Fable 5 is described as Anthropic's internal 'Mythos' model made safe for public release.
- 02It scored 91/100 on Every's Senior Engineer benchmark, which the post characterizes as human senior-engineer level.
- 03The previous benchmark high score was Opus 4.8 at 63; GPT-5.5 scored 62 on the same benchmark.
@danshipper shared a vibe-check of Claude Fable 5 — described as Anthropic's internal "Mythos" model released to the public — after roughly a week of internal testing at Every. On Every's Senior Engineer benchmark, Fable 5 scored 91/100, a level the post characterizes as human senior-engineer quality. The previous high score on that benchmark was Opus 4.8 at 63, with GPT-5.5 at 62. The post highlights the model's ability to complete large, autonomous coding tasks in a single pass: it reportedly cleared entire production bug backlogs, built a playable 3D environment, and produced a 2-minute animated film — all one-shot. It also demonstrated strong contextual reasoning, generating a crisp report from customer feedback surveys and website data that identified a key problem and a concrete, testable solution.
Fable 5 routinely uses 500k to 1M tokens per task, is described as very slow, and costs approximately twice as much as Opus.
The post is candid about the model's limitations. Fable 5 routinely uses 500k to 1M tokens per task, is described as very slow, and costs approximately twice as much as Opus. The post frames it as best suited for power users already experienced with orchestrating multiple agents, and explicitly notes that knowledge workers or "vibe coders" with more basic setups are unlikely to notice a significant difference and that it "probably isn't the right model" for them. The post uses the analogy of a "warp drive for coding" — capable of crossing the galaxy in hours, but not the right tool for getting around town. Even the most advanced internal testers, the post notes, felt they were "only scratching the surface" of the model's capabilities.
Key facts
- 01Claude Fable 5 is described as Anthropic's internal 'Mythos' model made safe for public release.
- 02It scored 91/100 on Every's Senior Engineer benchmark, which the post characterizes as human senior-engineer level.
- 03The previous benchmark high score was Opus 4.8 at 63; GPT-5.5 scored 62 on the same benchmark.
- 04The model completed large one-shot tasks including clearing production bug backlogs, building a playable 3D environment, and producing a 2-minute animated film.
- 05It routinely uses 500k to 1M tokens per task and is described as very slow.
- 06Fable 5 costs approximately twice as much as Opus.
- 07The post says it is best suited for power users orchestrating multiple agents, not for everyday knowledge work or basic setups.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 11, 2026 · 08:34 UTC. How this works →