Jun 15, 2026·1 min readResearch Papers

StateGen cuts tool-call hallucinations to 9.66/10 in synthetic training data

StateGen is a synthetic data generation platform that produces scored, reasoning-trace-rich multi-turn training conversations for tool-augmented LLMs, achieving a tool-call hallucination score of 9.66/10 across 64,698 evaluated conversations.

ArXiv·Rahul Khedar, Eshita, Sneha Teja Sree Reddy Thondapu

Read at source

Composite

6.3

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

StateGen's backend-is-truth invariant eliminates tool-call hallucinations by construction — a problem the paper identifies as the dominant failure class in tool-augmented LLM training data — while combining capabilities (multi-turn generation, state-grounded tool simulation, hierarchical multi-agent support, and built-in judge scoring) that no single publicly available platform currently offers together.

01StateGen is a synthetic data generation platform for producing multi-turn, tool-grounded training conversations for tool-augmented LLM agents.
02It orchestrates a four-role LLM loop: a persona-conditioned user simulator, an agent under test, a state-grounded tool simulator, and a multi-axis LLM judge.
03A central state manager enforces a backend-is-truth invariant, eliminating the dominant class of tool-call hallucinations by construction.

Summary— our read of the original

Rahul Khedar, Eshita, and Sneha Teja Sree Reddy Thondapu introduce StateGen, a synthetic data generation platform targeting a well-documented gap in LLM training resources: large-scale, multi-turn, tool-grounded conversational datasets are expensive to annotate, privacy-constrained in production environments, and largely absent from public collections. StateGen addresses this by orchestrating a four-role LLM loop consisting of a persona-conditioned user simulator, an agent under test, a state-grounded tool simulator, and a multi-axis LLM judge. The central architectural contribution is an authoritative state manager that maintains a structured world-state object across conversation turns, enforcing what the paper calls a backend-is-truth invariant — a design choice the authors argue eliminates the dominant class of tool-call hallucinations by construction rather than by post-hoc filtering.

StateGen also extends to hierarchical multi-agent settings by declaring sub-agents as tools, all sharing a single state object, enabling complex agent topologies without breaking state consistency.

StateGen also extends to hierarchical multi-agent settings by declaring sub-agents as tools, all sharing a single state object, enabling complex agent topologies without breaking state consistency. Persona-driven variation is supported through a 23-dimensional trait vector. Evaluated across 64,698 conversations drawn from three production corpora, the system achieves a tool-call hallucination score of 9.66/10. The paper includes a cleanly separated train and golden evaluation set split, with per-criterion gap analysis used to confirm the generated data does not constitute memorization bait. A comparison with eight external systems found that no single publicly available platform combines multi-turn generation, state-grounded tool simulation, hierarchical multi-agent support, and built-in judge scoring simultaneously.

Key facts

01StateGen is a synthetic data generation platform for producing multi-turn, tool-grounded training conversations for tool-augmented LLM agents.
02It orchestrates a four-role LLM loop: a persona-conditioned user simulator, an agent under test, a state-grounded tool simulator, and a multi-axis LLM judge.
03A central state manager enforces a backend-is-truth invariant, eliminating the dominant class of tool-call hallucinations by construction.
04Tool-call hallucination scores reach 9.66/10 across 64,698 evaluated conversations spanning three production corpora.
05Persona-driven variation is supported via a 23-dimensional trait vector.
06StateGen extends to hierarchical multi-agent settings by declaring sub-agents as tools, all sharing a single state object.
07Comparison with eight external systems found no single publicly available platform combines all four of StateGen's core capabilities.

Topics

#synthetic-data #tool-use #multi-agent #agent-framework #training-data

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →

StateGen cuts tool-call hallucinations to 9.66/10 in synthetic training data

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.