Wiki: Agent coordination

Agent coordination is the set of mechanisms by which multiple autonomous LLM agents exchange information, delegate subtasks, and maintain consistent shared state. The first wave of multi-agent systems research, surveyed by Christopher Meiklejohn in his series, treated coordination primarily as a proof-of-concept. Papers like CAMEL, MetaGPT, and AutoGen demonstrated that agents could be made to collaborate, but shipped without concurrency control or escalation paths Getting Up to Speed, Part 3.

The second wave turned empirical and found that failure is the norm. MAST, MAS-FIRE, and Silo-Bench measured production failure rates between 41 and 87 percent, with inter-agent reasoning failures proving structurally harder to fix than prompt-level issues Part 4. These are not edge cases; they are the dominant outcome when coordination is left undesigned.

Coordination structure matters as much as model quality. Research on debate, shared-notebook state, and the CALM theorem each points to the same finding: the right interaction pattern depends on the task. Convergent debate suits tasks that need consensus; adversarial debate suits tasks that need error correction; shared mutable state suits tasks that need continuity across agents Part 5. The CALM theorem, borrowed from distributed systems, formalizes when coordination is even necessary, a formalism the MAS field has been slow to adopt.

There is also a coordination tax. Stanford and Google/MIT research cited by Ben Dickson found that multi-agent orchestration can amplify errors up to 17x and cut tool-handling efficiency by 2 to 6x compared with single-agent baselines, making single-agent systems the rational default for most tasks How to Choose. Meiklejohn’s open questions post identifies backpressure protocols, CRDTs for shared state, and topology-to-reliability mappings as the unsolved problems that would close this gap, arguing the field is rediscovering distributed systems without the vocabulary to name it Part 8.

Benchmarks have not caught up. HumanEval and SWE-bench were designed for single agents and cannot measure coordination quality, communication overhead, or failure recovery, which means published numbers systematically overstate how well multi-agent systems actually work Part 7.