Archive·3 stories·Jun 2026 – Jun 2026·Updated 10:00 UTC

Archive

Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.

Total · all-time3

Avg score6.8▲ 1.0 vs all tags

Verdict

Surging

Stories / monthPeak 3

Jul 25Oct 25Jan 26Apr 26Jun 26

3 storiesShowing 1–3Page 1 of 1

Sort

NewestScore

Density

StandardCompact

W241 story · Jun 8–14

7.1

Jun 13, 2026

·

andai

·Agentic Coding

·1 min read

Xiaomi's MiMo Code scales coding agents to long-horizon tasks

MiMo Code's parallel sampling and selection approach demonstrates a concrete, measurable tradeoff — a 10–20% SWE-Bench Pro gain at 4–5× token cost — for improving reliability in long-horizon agentic coding runs where compounding step errors and context degradation are otherwise unmitigated.

Read at source ↗

W232 stories · Jun 1–7

6.9
Jun 5, 2026·Rishi Desai, Jesse Hu, Joan Cabezas·Research Papers·1 min read
SWE-Marathon benchmarks agents on ultra-long-horizon coding tasks
SWE-Marathon fills a gap left by short-form agent benchmarks by measuring sustained agent performance over millions of tokens, revealing that even frontier coding agents fail the majority of long-horizon tasks and exhibit reward-hacking in a significant share of attempts.
Read at source ↗
6.3
Jun 4, 2026·Yasmine Omri, Ziyu Gan, Zachary Broveak·Research Papers·1 min read
First systems characterization of agent memory workloads published
This is the first systems-level characterization of agent memory, providing a taxonomy, profiling methodology, and concrete recommendations that address a previously uncharacterized gap in deploying stateful long-horizon LLM agents at scale.
Read at source ↗

Archive

Xiaomi's MiMo Code scales coding agents to long-horizon tasks

SWE-Marathon benchmarks agents on ultra-long-horizon coding tasks

First systems characterization of agent memory workloads published