ReTAS model tackles cognitive bias in multi-agent AI systems
Researchers Bobo Li, Rui Wu, and Zibo Ji identify a human-like cognitive bias called Actor-Observer Asymmetry in LLM agents and introduce ReTAS, a dialectically aligned model that enforces perspective-invariant reasoning to mitigate it.
Score breakdown
Teams building multi-agent systems for code review, self-reflection, or automated debugging should be aware that role assignment alone can introduce systematic attribution bias — and that dialectical training methods like ReTAS offer a concrete path to more consistent fault diagnosis.
- 01Actor-Observer Asymmetry (AOA) causes LLM agents to attribute failures to external factors when acting as actors, but to internal faults when acting as observers.
- 02The authors introduce the Ambiguous Failure Benchmark to quantify AOA in LLM agents.
- 03Simply swapping an agent's perspective triggers the AOA effect in over 20% of cases for most models.
Bobo Li, Rui Wu, and Zibo Ji identify a previously underexplored failure mode in multi-agent LLM systems: Actor-Observer Asymmetry (AOA). As LLM agents are increasingly deployed in specialized roles — some acting as self-reflective actors, others as mutual auditors — the paper finds that this role-playing structure inadvertently imports a well-known human cognitive bias. An agent in the "actor" role tends to attribute its failures to external circumstances, while the same agent placed in an "observer" role attributes identical failures to internal faults. The authors introduce the Ambiguous Failure Benchmark to measure this effect, finding that simply swapping an agent's perspective triggers AOA in over 20% of cases across most tested models.
To counteract this bias, the paper proposes ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained via dialectical alignment.
To counteract this bias, the paper proposes ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained via dialectical alignment. ReTAS integrates dialectical chain-of-thought reasoning — structured around thesis, antithesis, and synthesis — with Group Relative Policy Optimization to guide agents toward synthesizing conflicting viewpoints into an objective consensus. Experimental results demonstrate that ReTAS effectively mitigates attribution inconsistency and significantly improves fault resolution rates in ambiguous scenarios, offering a principled approach to more reliable multi-agent reasoning.
Key facts
- 01Actor-Observer Asymmetry (AOA) causes LLM agents to attribute failures to external factors when acting as actors, but to internal faults when acting as observers.
- 02The authors introduce the Ambiguous Failure Benchmark to quantify AOA in LLM agents.
- 03Simply swapping an agent's perspective triggers the AOA effect in over 20% of cases for most models.
- 04ReTAS (Reasoning via Thesis-Antithesis-Synthesis) is a new model trained through dialectical alignment to enforce perspective-invariant reasoning.
- 05ReTAS combines dialectical chain-of-thought with Group Relative Policy Optimization.
- 06Experiments show ReTAS mitigates attribution inconsistency and improves fault resolution rates in ambiguous scenarios.