Command Palette

Search for a command to run...

AUAgentic Universe

A calmer way to keep up with the agentic stack. Every story links back to its source.

Trust

Methodology
Sources
Corrections
Attribution

Read

Today
Archive
Best
Weekly
Monthly
Daily digest
Docs
Embed widget
RSS · JSON

Legal

Terms
Refund
Privacy
DMCA

Telegram ↗Built in the open ↗

Agentic Universe

Today Weekly Monthly Archive Learn

Command Palette

Search for a command to run...

Archive·258 stories·Jun 2026 – Jun 2026·Updated 01:07 UTC

Archive

Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.

Filters· 1

Active · 1Clear all

category:Research Papers

Date range

Min scoreAny

0510

Coding benchmarks are misaligned with agentic software engineering

Because any single harness component can move benchmark scores by margins comparable to those between adjacent model generations, end-to-end scores can misattribute performance gains and mislead practitioners trying to improve agentic systems.

Read at source ↗

7.3

NICD

Jun 16, 2026·aXueping Gao·Research Papers·1 min read

SkillWeaver routes LLM agents through complex multi-skill tasks via decompose-retrieve-compose

The paper identifies task decomposition — not retrieval — as the binding constraint in multi-skill agent planning, and SAD's single-iteration fix raises decomposition accuracy by over 32 percentage points, directly improving how reliably agents can assemble executable plans from large real-world skill libraries.

Read at source ↗

7.1

NICD

Jun 16, 2026·aDipayan Banik, Kowshik Chowdhury, Shazibul Islam Shamim·Research Papers·1 min read

80% of AI-agent test patches lack meaningful verification logic

The finding that 80.2% of agent-authored test patches lack meaningful assertions means that quality gates relying on test-file presence give a false signal of verification coverage in AI-generated code.

Read at source ↗

6.6

NICD

Jun 16, 2026·aJian Yang, Shawn Guo, Wei Zhang·Research Papers·1 min read

LoopCoder-v2 finds two loops is the sweet spot for parallel loop Transformers

The paper establishes that PLT performance saturates at exactly two loops and provides a gain–cost diagnostic framework explaining why, giving practitioners a principled basis for loop-count selection rather than relying on monotonic scaling assumptions.

Read at source ↗

5.8

NICD

Jun 16, 2026·𝕏@AnthropicAI·Research Papers

Claude Code success rates stay within 7 points across occupations

The finding that non-software occupations achieve success rates within 7 percentage points of software engineering on Claude Code's strictest metric suggests the tool's effectiveness is not limited to developers.

Read at source ↗

6.2

NICD

Jun 16, 2026·aRean Clive Fernandes, Lukas Fehring, Theresa Eimer·Research Papers·1 min read

Automated prompt optimization lifts LLM game agents from 0% to 72.5% on PutNext

The framework demonstrates that automated prompt optimization alone — without any fine-tuning — can turn a completely failing LLM agent (0% on PutNext) into one that succeeds nearly three-quarters of the time, showing prompt engineering can be systematically automated rather than done by hand.

Read at source ↗

6.4

NICD

Jun 16, 2026·aAnder Alvarez, Santhiya Rajan, Samuel Mugel·Research Papers·1 min read

ProvenanceGuard catches cross-source attribution errors in MCP agents

The paper demonstrates that source attribution is an independent axis of factuality verification — meaning standard source-blind metrics can pass answers that contain incorrect attributions, a gap ProvenanceGuard is designed to close in MCP-based agents.

Read at source ↗

5.4

NICD

Jun 15, 2026·aReef Menaged, Gili Lior, Shauli Ravfogel·Research Papers·1 min read

LLM agents struggle to infer hidden world models via interaction

The benchmark exposes concrete, measurable gaps in LLM agents' ability to infer hidden world models through interaction, providing a rigorous testbed with classical algorithm baselines that quantifies how far current agents fall short of robust interactive discovery.

Read at source ↗

W241 story · Jun 8–14

6.2
Jun 14, 2026·Vincent Schmalbach·Research Papers·1 min read
Delegation contracts improve AI agent reviewability, not correctness
The study establishes that explicit delegation contracts improve the reviewability of AI coding agent work — not its correctness — reframing the contract as a mechanism for human oversight rather than a driver of agent task performance.
Read at source ↗

Page 4 of 26·Showing 31–40 of 258

←1…345…26 →

Older stories →