What is the difference between a workflow and an agent in multi-agent systems?

Anthropic draws the line by where control lives. A workflow orchestrates LLMs and tools through predefined code paths that you write, so the structure is fixed and inspectable. An agent is a system where the LLM dynamically directs its own process and tool usage. Most production multi-agent systems are actually workflows with one or two agentic steps inside them, which is usually the right balance of capability and predictability.

What is the difference between the orchestrator-worker and supervisor patterns?

Both have a coordinating LLM at the center, but the lifecycle differs. In orchestrator-worker, a lead agent dynamically decomposes a single task and spawns short-lived workers for the subtasks it just invented, then synthesizes their output. In the supervisor pattern, a router sits in front of a fixed roster of long-lived specialist agents and decides which one handles each turn. Orchestrator-worker fits one-shot decomposition; supervisor fits ongoing routing across known specialties.

Does a multi-agent system always beat a single agent?

No. Anthropic reported their multi-agent research system beat a single-agent baseline by 90.2% on an internal research eval, but it also used about 15x the tokens of a normal chat. Cognition argued the opposite for coherent, write-heavy tasks like coding, because parallel agents with separate context make conflicting decisions. The honest answer is that multi-agent wins for parallel, read-heavy breadth work and loses for stateful tasks that demand a single coherent thread of decisions.

Which framework should I use for multi-agent orchestration?

Match the framework to the topology you actually need. LangGraph gives you explicit graph control for supervisor, hierarchical, and swarm layouts. CrewAI is fast for role-based crews with sequential or hierarchical process. The OpenAI Agents SDK is clean for handoff-style coordination. Microsoft AutoGen and the newer Agent Framework suit conversational group chat. Pick the topology first, then the framework that expresses it with the least ceremony.

Multi-Agent Orchestration Patterns: A Field Guide

A team tells me they are "moving to a multi-agent architecture," and I have learned to ask one question back: which one? The blank pause that follows is the actual problem. Multi-agent has become a single word for a dozen different system shapes, and teams adopt the label long before they choose a topology. Then they wonder why the thing is slow, expensive, and quietly incoherent.

"Multi-agent" is not an architecture. It is a category that contains at least six distinct orchestration patterns, each with its own control flow, cost profile, and failure modes. Picking the wrong one is the most common reason these systems disappoint. A swarm where you needed a pipeline burns tokens and produces drift. A pipeline where you needed an orchestrator caps your ceiling on anything open-ended.

This is a field guide to those patterns. For each one I will cover what it is, when it wins, what it costs in tokens, how it fails, and which frameworks express it cleanly. I have built versions of most of these in production, and I have torn down more than a few. If you want the broader story of what breaks when agents hit real traffic, I wrote that up in my production agent architecture post. This post is narrower and more structural: it is about the shape of coordination itself.

The real decision is topology, not single versus multi

The popular debate is framed as single agent versus multi-agent, and it produces two camps that both sound right.

In one camp, Anthropic published how they built their multi-agent research system and reported that a Claude Opus orchestrator coordinating Claude Sonnet subagents beat a single-agent Opus baseline by 90.2% on their internal research evaluation. In the other camp, Cognition published Don't Build Multi-Agents, arguing that parallel agents fragment context and make conflicting decisions, which is why Devin runs a single-threaded linear agent instead.

Both are correct, because they are describing different topologies applied to different task shapes. Anthropic's win was on research: a broad, read-heavy, parallelizable task where subagents explore separate branches and never need to agree on a shared mutation. Cognition's warning is about coding: a coherent, write-heavy task where every action carries an implicit decision and two agents editing in parallel will contradict each other. The lesson is not "multi-agent good" or "multi-agent bad." It is that coordination shape has to match task shape.

Anthropic's own framing in Building Effective Agents is the right starting posture. They draw a hard line between workflows, where "LLMs and tools are orchestrated through predefined code paths," and agents, where "LLMs dynamically direct their own processes and tool usage." Their headline advice is to "find the simplest solution possible, and only increase complexity when needed. This might mean not building agentic systems at all." Every pattern below is a point on that complexity gradient. Read them as a ladder, not a menu, and stop climbing the moment a rung does the job.

Pattern 1: Sequential pipeline

A pipeline decomposes a task into a fixed sequence of steps where each LLM call processes the output of the previous one. Anthropic calls this prompt chaining, and it is the humblest pattern on the list. There is no agent making routing decisions. The control flow is code you wrote.

// pipeline.ts - each stage transforms the previous stage's output
const outline = await llm(outlinePrompt(brief));
const draft = await llm(draftPrompt(outline));
const edited = await llm(editPrompt(draft)); // programmatic check between stages

It wins when a task naturally breaks into ordered stages and you want a gate between each one: draft then translate, extract then validate then load, plan then generate. Because every step is explicit, the pipeline is the easiest pattern to debug and the cheapest to run. Token cost is linear in the number of stages, and there is no orchestrator overhead.

It fails through error propagation. A bad outline poisons every downstream stage, so the value of the programmatic check between steps is not optional polish, it is the thing that keeps a small mistake from compounding. Reach for a pipeline far more often than instinct suggests. Most "agent" projects are a pipeline wearing a costume. CrewAI expresses this directly as Process.sequential, and in LangGraph it is a StateGraph with linear edges.

Pattern 2: Parallel fan-out

Fan-out runs independent subtasks at the same time and then merges the results. Anthropic splits this into two variants. Sectioning partitions a task into independent chunks, like running extraction over many document pages at once. Voting runs the same task several times and picks or synthesizes the best answer.

// fan-out.ts - sectioning: independent chunks, one merge step
const sections = await Promise.all(
  pages.map((page) => llm(extractPrompt(page)))
);
const merged = await llm(combinePrompt(sections));

It wins on two axes. The first is latency: if order does not matter, parallel execution collapses wall-clock time. The second is reliability through consensus, which is what voting buys you when a single pass is too noisy. This is still a workflow, not an agent, because the orchestration lives entirely in your code.

The failure mode hides in the merge step. Independent branches produce locally sensible but globally inconsistent output, and a weak combine prompt lets those inconsistencies through. Fan-out is also where token cost starts to climb, since you pay for N branches even though they finish in the time of one. Use it when subtasks are genuinely independent. The moment a branch needs to know what another branch decided, you have outgrown fan-out and you are looking at the next pattern.

Pattern 3: Orchestrator-worker

Here a central LLM dynamically breaks down a task, delegates the pieces to worker LLMs, and synthesizes their results. The difference from fan-out is that the subtasks are not known in advance. The orchestrator invents them at runtime based on the input. This is the pattern behind Anthropic's research system, and it is the one most production "multi-agent" systems actually want.

// orchestrator.ts - the lead agent decides the subtasks, then merges
const plan = await orchestrator(planPrompt(query)); // returns N subtasks
const results = await Promise.all(
  plan.subtasks.map((task) => worker(workerPrompt(task)))
);
const answer = await orchestrator(synthesizePrompt(query, results));

It wins when a task needs dynamic decomposition and central coordination at once: open-ended research, multi-file code changes, anything where you cannot pre-write the branch list. The orchestrator holds the global plan, which keeps the workers pointed in one direction. If you have configured subagents in Claude Code, you have run a hand-tuned version of this; I covered that setup in my Claude Code configuration deep-dive.

The costs are real. Anthropic reported their multi-agent system used about 15 times the tokens of a normal chat, and the failure mode is worker redundancy: two workers reading the same source, or following slightly divergent plans, unless the orchestrator maintains tight shared state and avoids handing out overlapping assignments. Frameworks make this concrete. langgraph-supervisor packages it, CrewAI's Process.hierarchical adds a manager agent that delegates, and the OpenAI Agents SDK can express it through the agents-as-tools pattern, where a worker agent is exposed as a callable tool.

Pattern 4: Supervisor and hierarchy

The supervisor pattern looks similar to orchestrator-worker but differs in lifecycle. Instead of spawning short-lived workers for invented subtasks, a router sits in front of a fixed roster of long-lived specialist agents and decides which one should handle each turn. Think of it as routing over a standing team rather than decomposing a one-shot job.

When one supervisor's span of control gets too wide, you nest it: a top-level supervisor coordinates mid-level supervisors, each of which owns a team of specialists. That is the hierarchical variant, and it is how you scale past the point where a single coordinator can reason about every agent at once.

# supervisor.py - LangGraph's standing-roster router
from langgraph_supervisor import create_supervisor
 
app = create_supervisor(
    agents=[researcher, coder, writer],   # long-lived specialists
    model=llm,
    prompt="Route each request to the right specialist.",
).compile()

It wins for systems with stable, distinct specialties that recur across many requests: a research agent, a coding agent, a writing agent, each invoked as the conversation demands. The supervisor gives you a single auditable place where routing decisions happen, which matters for debugging and for enforcing budget or safety policy. Token cost sits between fan-out and full orchestration, and the dominant failure mode is misrouting: a supervisor that sends work to the wrong specialist derails the whole turn. LangGraph documents this as one of its core multi-agent architectures, alongside network and custom layouts, all built on the same StateGraph primitive of nodes, edges, and shared state.

Pattern 5: Blackboard

The blackboard pattern is older than LLMs by forty years. It comes from systems like Hearsay-II, the 1970s speech understanding project, and the structure is simple: a shared knowledge store sits at the center, multiple specialist modules read from it and write partial results back to it, and a control component decides which specialist to activate next based on the current state.

Map that onto agents and it reads almost unchanged. The blackboard becomes a shared state object or workspace. The knowledge sources become specialist agents. The control component becomes a scheduler or another LLM. Agents do not call each other directly. They contribute to and consume from a common workspace, opportunistically, as the state makes their skill relevant.

It wins for collaborative tasks with a loose, evolving order, where you cannot pre-script who acts when but every actor benefits from seeing the full shared picture. A design workflow where a planner, a critic, and a builder iterate on a shared artifact is a natural fit. This pattern also aligns well with Cognition's first principle, "share context, and share full agent traces, not just individual messages," because the blackboard is the canonical shared trace by construction.

The catch is the same one Walden Yan warns about. In practice, context window limits mean each agent only sees a slice of the blackboard at any call, which quietly reintroduces the fragmentation the pattern was supposed to prevent. If your blackboard grows faster than you can summarize it, the shared store becomes shared in name only. In modern stacks, a LangGraph shared StateGraph behaves like a blackboard, and AutoGen's GroupChat is effectively a conversational blackboard where a GroupChatManager plays the control component.

Pattern 6: Swarm

A swarm is decentralized. There is no central orchestrator deciding every step. Instead, each agent can hand off control directly to a peer when it decides another specialist should take over. Routing is emergent: it lives in the agents' own decisions rather than in a supervisor's logic.

# swarm.py - OpenAI Agents SDK: agents hand off to peers
from agents import Agent
 
billing = Agent(name="Billing", instructions="Handle billing questions.")
triage = Agent(
    name="Triage",
    instructions="Hand off to the right specialist.",
    handoffs=[billing],   # control transfers, it does not return to a hub
)

It wins for loosely coupled, exploratory work where pre-specifying a global plan is hard and you want agents to pull themselves in when a situation matches their skill. It is also the lightest pattern to extend, since adding an agent means giving peers a new handoff target rather than editing central routing logic. The OpenAI Agents SDK, the production successor to the experimental Swarm project, implements this through handoffs, and langgraph-swarm packages the same idea.

The failure modes are the steepest of any pattern here. Decentralized routing is hard to debug because no single component has the full picture, and it is exactly the topology Cognition argues against for coherent tasks. Yan's second principle is that "actions carry implicit decisions, and conflicting decisions carry bad results," and a swarm maximizes the chance of conflicting decisions because control and context are both distributed. Use a swarm when you are searching across approaches and resilience matters more than strict coherence. Avoid it when the agents must agree on a shared mutation.

What every pattern costs

The slide decks show the boxes and arrows. They rarely show the bill. Token economics is the variable that most often decides whether a topology is viable, and it scales badly as you add coordination.

Anthropic's own numbers are the clearest public anchor. They reported that agents typically use about 4 times the tokens of a chat interaction, and that multi-agent systems use about 15 times the tokens of a chat. More striking, in their analysis "token usage by itself explains 80% of the variance" in performance on their BrowseComp evaluation. Read that carefully, because it cuts both ways. It justifies multi-agent for high-value breadth tasks where spending tokens buys real quality. It also implies that much of what looks like architectural magic is actually just a bigger token budget, which means a single agent with more room to think can close part of the gap without any of the coordination risk.

That is the core of Cognition's argument too. Before you commit to orchestration, ask whether a single linear agent with a larger budget and better context management would get you most of the way there. Cognition's answer for coding was yes, which is why they pair a single-threaded agent with a dedicated context-compression model rather than a swarm of parallel workers. The hidden costs beyond tokens are latency, which grows with coordination depth, and debuggability, which falls off a cliff once control is distributed.

How I choose

I run every candidate system through the same short ladder before I pick a topology. The questions are ordered, and I stop at the first one that resolves the design.

Is the control flow predictable? If you can write the steps in advance, you want a workflow, not an agent. Use a pipeline or fan-out and stop here. This resolves most projects.
Are the subtasks independent and read-heavy? If branches never need to know what other branches decided, fan-out or orchestrator-worker fits, and the token cost buys you breadth.
Is the task stateful and coherence-critical? If actions are writes that must agree with each other, keep a single coherent thread. A linear agent or a tight supervisor beats parallel subagents, exactly as Cognition argues.
Do you have an evaluation harness? If you cannot measure whether the multi-agent version actually beats a strong single-agent baseline, do not build it yet. Anthropic's 90.2% number only means something because they had the eval to prove it.

Here is the same logic as a reference table.

Pattern	Control locus	Token cost	Best fit	Worst fit
Sequential pipeline	Your code	Lowest	Ordered, gated stages	Open-ended tasks
Parallel fan-out	Your code	Medium	Independent chunks, voting	Interdependent branches
Orchestrator-worker	One lead LLM	High	Dynamic decomposition, research	Simple or routine tasks
Supervisor / hierarchy	Router LLM	Medium-high	Stable recurring specialties	One-shot decomposition
Blackboard	Control component	Variable	Shared evolving artifact	Tight context budgets
Swarm	Distributed	Variable	Exploratory, loosely coupled	Coherence-critical writes

And the framework that expresses each topology with the least ceremony:

Topology	Cleanest expression
Pipeline	CrewAI `Process.sequential`, LangGraph linear edges
Fan-out	LangGraph parallel nodes, plain `Promise.all`
Orchestrator-worker	`langgraph-supervisor`, CrewAI `Process.hierarchical`, Claude Code subagents
Supervisor / hierarchy	LangGraph supervisor and hierarchical layouts
Blackboard	LangGraph shared `StateGraph`, AutoGen `GroupChat`
Swarm	OpenAI Agents SDK handoffs, `langgraph-swarm`

If you are still weighing which framework to commit to underneath all this, I compared the main options in my agent framework comparison, and the practical reality of running agents solo is in building an AI team as a solo developer.

The pattern behind the patterns

The mistake I see most is treating multi-agent as a level of sophistication to graduate into, as if more agents means a more advanced system. It is the opposite. Each step up this ladder is a trade you make reluctantly, paying in tokens, latency, and debuggability for a capability you genuinely cannot get from a simpler shape.

So I start at the bottom and climb only when forced. Most systems that call themselves multi-agent are a pipeline with one agentic step, and they are better for it. The teams that succeed are not the ones with the most elaborate topology. They are the ones who matched the shape of their coordination to the shape of their task, proved the upgrade with an eval, and resisted the urge to add an agent that the problem never asked for. Pick the topology on purpose, and the word "multi-agent" stops being a buzzword and starts being a decision.