Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.
GLM-5.2's combination of a 1 million token context window, expected MIT-licensed open weights, and ~$8/month pricing places a near-frontier coding model within reach of developers who cannot afford or prefer not to use Claude or Codex pricing tiers.
The safeguard architecture means Fable 5's cybersecurity performance is effectively equivalent to Opus 4.8 rather than the full Mythos 5 model, making the practical capability gap between the general-release and partner-only versions larger than benchmark numbers alone suggest.
A leaked, unverified model called Oceanus V1-P outscored all other models tested — including Opus 4.8 and GPT-5.5 — by a wide margin on a diverse set of practical coding and reasoning tasks, though its true origin and stability remain unknown.
FrontierCode represents a stricter standard for evaluating AI coding agents by requiring production-quality, review-ready code rather than just functional correctness — and the low scores even from leading models show the benchmark is far from saturated.
The release transforms Hermes from a primarily terminal-driven tool into a multi-surface platform with a native GUI and remote agent control, removing the barrier that previously required users to read config files and terminal logs to operate it.