Skip to content

Agent coordination in LLM-based systems is the set of mechanisms by which multiple agents divide tasks, exchange information, and arrive at consistent outputs. The field is still working out when coordination helps versus when it simply multiplies costs.

The empirical picture is sobering. Meiklejohn’s survey of MAST, MAS-FIRE, and Silo-Bench found that multi-agent LLM systems fail 41-87% of the time across 1,600 traced runs. Crucially, the dominant failure mode is not coordination breakdown but information synthesis, meaning agents fail to integrate results correctly even when the coordination scaffolding works as intended.

The coordination mechanisms themselves are varied. Meiklejohn’s follow-up surveys convergent debate, adversarial debate, shared-notebook state management, and the CALM theorem from distributed systems. The central argument there is that coordination structure must match task structure, and that distributed systems theory already supplies the vocabulary the AI field keeps rediscovering.

The cost of getting this wrong is quantified by Dickson’s survey of Stanford and Google/MIT research: multi-agent orchestration can amplify errors up to 17x and cut tool-handling efficiency by 2-6x relative to a single-agent baseline. The practical implication is that single-agent systems should be the default unless the task structure genuinely requires parallelism or specialization.

When multi-agent architectures are warranted, observability becomes critical. openagentd is one example of a self-hosted agent runtime that builds in OpenTelemetry tracing alongside persistent memory and scheduling, treating coordination transparency as a first-class concern rather than an afterthought.

Related concepts