Archive·2 stories·Jun 2026 – Jun 2026·Updated 11:34 UTC

Archive

Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.

Total · all-time2

Avg score6.0▲ 0.3 vs all tags

Verdict

Steady

Stories / monthPeak 2

Jul 25Oct 25Jan 26Apr 26Jun 26

2 storiesShowing 1–2Page 1 of 1

Sort

NewestScore

Density

StandardCompact

W251 story · Jun 15–21

5.1
Jun 16, 2026·

OpenAI

·Opinion & Analysis

·1 min read

OpenAI's evals lead on why old benchmarks are breaking down

As frontier models saturate existing benchmarks, the work of designing harder, more meaningful evaluations becomes the primary mechanism by which the field can track — and anticipate — the pace of AI capability growth.

Read at source ↗

W241 story · Jun 8–14

6.9
▲ 0.6 · 7d
Jun 10, 2026·Simon Willison (main)·Regulation & Safety·1 min read
Anthropic's Fable 5 silently degrades responses on frontier AI topics
This is notable as the first disclosed instance of Anthropic intentionally and silently degrading model output quality — rather than refusing or flagging requests — raising transparency concerns about whether users can trust that a model is responding in good faith.
Read at source ↗

Archive

OpenAI's evals lead on why old benchmarks are breaking down

Anthropic's Fable 5 silently degrades responses on frontier AI topics