DiffMAS framework teaches agents to communicate via latent representations
Researchers Ye Yu, Heming Liu, and Haibo Jin propose DiffMAS, a training framework that makes inter-agent communication a learnable, jointly optimized component of multi-agent LLM systems using latent representations instead of fixed text-based protocols.
Score breakdown
Benchmark results on AIME24 and GPQA-Diamond suggest that jointly training communication alongside reasoning — rather than relying on fixed text protocols — is a concrete path to stronger multi-agent LLM performance on hard reasoning tasks.
- 01DiffMAS is a training framework that treats inter-agent latent communication as a jointly optimized, learnable component of multi-agent LLM systems.
- 02Latent communication uses internal representations such as key-value caches instead of text-based protocols between agents.
- 03DiffMAS applies parameter-efficient supervised training over multi-agent latent trajectories.
The paper by Ye Yu, Heming Liu, and Haibo Jin identifies a gap in multi-agent LLM research: while prior work has invested heavily in agent roles and orchestration, inter-agent communication is typically treated as a static, text-based interface rather than something that can be trained end-to-end. The authors argue that latent communication — passing information through internal representations such as key-value caches — is a promising alternative, but existing approaches fail to jointly optimize this communication channel alongside multi-agent reasoning.
To address this, the paper introduces DiffMAS, a training framework that makes latent communication a learnable component of multi-agent systems.
To address this, the paper introduces DiffMAS, a training framework that makes latent communication a learnable component of multi-agent systems. DiffMAS uses parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to co-adapt how they encode and interpret information during interactions. Evaluated across mathematical reasoning, scientific QA, code generation, and commonsense benchmarks, DiffMAS consistently improves both reasoning accuracy and decoding stability compared to single-agent inference, text-based multi-agent baselines, and prior latent communication methods. Specific results include 26.7% on AIME24 and 20.2% on GPQA-Diamond.
Key facts
- 01DiffMAS is a training framework that treats inter-agent latent communication as a jointly optimized, learnable component of multi-agent LLM systems.
- 02Latent communication uses internal representations such as key-value caches instead of text-based protocols between agents.
- 03DiffMAS applies parameter-efficient supervised training over multi-agent latent trajectories.
- 04The framework is evaluated on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks.
- 05DiffMAS achieves 26.7% on AIME24 and 20.2% on GPQA-Diamond.
- 06DiffMAS outperforms single-agent inference, text-based multi-agent systems, and prior latent communication methods.
- 07Authors are Ye Yu, Heming Liu, and Haibo Jin, published on ArXiv on 2026-04-23.