Archive·1 story·Jun 2026 – Jun 2026·Updated 19:10 UTC

Archive

Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.

1 storyShowing 1–1Page 1 of 1

Sort

NewestScore

Density

StandardCompact

W251 story · Jun 15–21

7.0
Jun 16, 2026·u/Saraozte01·Research Papers·1 min read
HalBench v2.3 tests 29 OSS models on sycophancy resistance
HalBench v2.3 shows that sycophancy resistance is largely decoupled from model size and architecture, with a ~27B model outperforming models up to 402B and several closed frontier models on false-premise pushback.
Read at source ↗

HalBench v2.3 tests 29 OSS models on sycophancy resistance