Jun 4, 2026·1 min readResearch Papers

ALMANAC dataset benchmarks LLMs on human collaborative mental models

Researchers introduce ALMANAC, a dataset of 2,987 action-level mental model annotations built from the Map Task, designed to evaluate how well LLMs can simulate human collaborative behavior and infer underlying mental models.

ArXiv·Jiaju Chen, Yuxuan Lu, Jiayi Su

Read at source

Composite

6.4

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

ALMANAC provides the first dataset with action-level mental model annotations grounded in authentic human collaboration, offering a concrete benchmark for evaluating whether LLM agents can simulate the reasoning alignment that effective human collaboration requires.

01ALMANAC stands for Action-Level Mental model ANnotations for Agent Collaboration.
02The dataset contains 2,987 collaboration actions drawn from the Map Task, a classic dyadic routing task from social science.
03Each action is paired with theory-informed mental model annotations covering self-reasoning, perceived partner intent, and perceived team goal.

Summary— our read of the original

Jiaju Chen, Yuxuan Lu, and Jiayi Su argue that while LLM agents have grown capable of multi-step reasoning, planning, and tool use, they remain primarily optimized for task completion rather than genuine collaboration. Effective human collaboration requires continuously maintaining and aligning mental models — tracking one's own reasoning, a partner's intentions, and shared goals throughout an interaction — and the research community has lacked authentic human data annotated at this level of granularity.

To address this, the authors introduce ALMANAC (Action-Level Mental model ANnotations for Agent Collaboration), constructed from the Map Task, a well-established dyadic routing task from social science.

To address this, the authors introduce ALMANAC (Action-Level Mental model ANnotations for Agent Collaboration), constructed from the Map Task, a well-established dyadic routing task from social science. The dataset contains 2,987 collaboration actions, each paired with theory-informed annotations recording three dimensions: self-reasoning, perceived partner intent, and perceived team goal. Six LLMs are benchmarked on two tasks — predicting humans' next-turn behavior and inferring their mental models — with results framing ALMANAC as an evaluation resource for measuring process-level collaborative competence in agents.

Key facts

01ALMANAC stands for Action-Level Mental model ANnotations for Agent Collaboration.
02The dataset contains 2,987 collaboration actions drawn from the Map Task, a classic dyadic routing task from social science.
03Each action is paired with theory-informed mental model annotations covering self-reasoning, perceived partner intent, and perceived team goal.
04The authors argue today's LLM agents are primarily optimized for task completion and lack process-level collaborative competence.
05Six LLMs are benchmarked on predicting humans' next-turn behavior and mental models.
06The paper is authored by Jiaju Chen, Yuxuan Lu, and Jiayi Su.

Topics

#multi-agent #benchmarks #agent-framework #reasoning #open-source

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →

Jun 4, 2026·1 min readResearch Papers

ALMANAC dataset benchmarks LLMs on human collaborative mental models

ArXiv·Jiaju Chen, Yuxuan Lu, Jiayi Su

Read at source

Composite

6.4

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01ALMANAC stands for Action-Level Mental model ANnotations for Agent Collaboration.
02The dataset contains 2,987 collaboration actions drawn from the Map Task, a classic dyadic routing task from social science.
03Each action is paired with theory-informed mental model annotations covering self-reasoning, perceived partner intent, and perceived team goal.

Summary— our read of the original

To address this, the authors introduce ALMANAC (Action-Level Mental model ANnotations for Agent Collaboration), constructed from the Map Task, a well-established dyadic routing task from social science.

Key facts

01ALMANAC stands for Action-Level Mental model ANnotations for Agent Collaboration.
02The dataset contains 2,987 collaboration actions drawn from the Map Task, a classic dyadic routing task from social science.
03Each action is paired with theory-informed mental model annotations covering self-reasoning, perceived partner intent, and perceived team goal.
04The authors argue today's LLM agents are primarily optimized for task completion and lack process-level collaborative competence.
05Six LLMs are benchmarked on predicting humans' next-turn behavior and mental models.
06The paper is authored by Jiaju Chen, Yuxuan Lu, and Jiayi Su.

Topics

#multi-agent #benchmarks #agent-framework #reasoning #open-source

Methodology

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.