The Coordination Problem

Synq LabsApril 2026

Preface

Today, AI-native companies are deploying hundreds of agents in production. They handle customer support tickets, write and ship code, manage knowledge bases, run internal workflows. The agents work. The problem is that they work alone.

At Anthropic, at Cursor, at Decagon, the same story plays out. A company's agents are technically capable of doing real things, but the moment two of them need to collaborate, someone has to build the plumbing by hand. Glue code, bespoke message passing, brittle APIs between systems that were never meant to talk. Every company building at scale is solving the same coordination problem from scratch, in isolation, with no shared substrate.

This is the gap we're innovating upon. Orchestration with a pretty UI still operates as a bandage to a much deeper problem. The principles underneath: persistent identity, async messaging, sandboxed workspaces, and eventually cross-company routing, they all turn a company's agents from a collection of tools into something of an organization.

The coordination layer for agents is the most important piece of infrastructure that doesn't exist yet, and we're committed to realizing it.

Introduction

There's a pattern in infrastructure history worth understanding. When the internet was young, companies built their own networking logic. When email proliferated, every org had its own mail server. When mobile exploded, every app rolled its own notification stack. In each case, a shared substrate eventually emerged; TCP/IP, SMTP, APNs. The companies that started owning those layers became foundational.

We're watching the same dynamic begin to play out with agents.

Every serious AI-native company today is running multiple agents. Some are running dozens. A handful are running hundreds. And every single one of them has independently arrived at the same realization: agents don't talk to each other well. They don't have addresses. They can't find each other. They don't have a way to hand off work asynchronously. They don't have a shared protocol for establishing trust.

This coordination problem isn't new. It's the same problem distributed systems engineers have always faced: how do you get autonomous processes to work together reliably, at scale, without a god object at the center? What's new is that the actors are now independently intelligent, the stakes lie in commercial workflows, and nobody has built the right infrastructure for this yet.

We have a thesis: the right model for agent coordination is not top-down orchestration. It's not a conductor agent giving instructions to worker agents, because that just recreates the single point of failure and the bottleneck of centralized control.

The right model is much closer to how ant colonies work, with agents emitting signals proportional to the urgency of a task, and other agents responding to those signals. Beautifully emergent coordination can arise from this protocol rather than from any central authority. No queen bee, no leader. Pheromone trails, not org charts.

This is the architecture we're building toward, and we're starting with the concrete, immediate problem that AI-native companies have today.

The Current State

To understand what's broken, it helps to look at how multi-agent systems are actually built right now.

The dominant approach is framework-based orchestration. LangChain, AutoGen, CrewAI, and their descendants all solve the same problem in roughly the same way: they let you define agents as objects inside an application, wire up how they communicate, and run them in a single process or a closely-coupled cluster. This works (only for a bit).

The compounding failures appear at scale. When an agent count goes from ten to a hundred, or when the agents are owned by different teams building on different stacks, or when you want an agent in one context to talk to an agent in a completely different one, frameworks fall apart. They're not designed for that. They're designed to be embedded in an application instead of infrastructure.

The second dominant approach is custom glue. Companies build message queues, shared databases, internal APIs, hand-rolled routing logic. This also works, but it means every AI-native company is independently building coordination infrastructure that has nothing to do with their core product. It's not defensible and is quite expensive. Plus, it doesn't scale past the team that built it.

What's missing is a layer that lives outside any single application, a substrate that agents can join, find each other on, communicate over, and trust. Something more analogous to how email works than to how a function call works.

The Problems

Agents Don't Have Addresses

The most fundamental issue is identity. An agent running in production today doesn't have a persistent address. It has a URL endpoint if it's an API, or a process if it's local, but nothing equivalent to an email address or a phone number. We need something stable, portable, and globally resolvable that other agents can route to independent of where the agent is running.

This makes discovery impossible in any principled way. If agent A needs to hand off a task to agent B, someone has to know where B lives, hard-code that dependency, and maintain it. As agent counts grow and infrastructure changes, this becomes untenable quickly. The addressing problem is foundational, and nothing else works well until it's solved.

ERC-8004 and similar proposals from the Ethereum ecosystem have tried to sketch what agent identity could look like, with an address tied not to a server but to an entity that's portable across contexts and usable for verification and routing.

Communication Has No Substrate

Related to the addressing problem is the messaging problem. Right now, when two agents need to communicate, they do it through whatever channel the developer set up (usually a shared database, a message queue, or a direct API call). These are fine for tightly-scoped workflows inside a single application. They break down when agents are decoupled, asynchronous, or running across organizational boundaries.

What agents need is closer to what email gave humans: an async, persistent, protocol-level messaging layer that doesn't require the sender and receiver to be connected at the same time, that preserves message history, and that can be monitored, debugged, and rate-limited independently of the application logic. None of the agent frameworks provide this. They assume synchronous, in-process communication, which is a fundamentally different thing.

The async requirement is particularly important. A lot of the work agents do takes seconds to minutes to hours. Forcing that into synchronous request-response patterns creates bottlenecks and failure modes that don't need to exist.

Coordination Breaks at Organizational Boundaries

The version of this problem that most companies are facing today is intra-company. But the version that matters in three years is inter-company.

Right now, when your customer support agent needs to do something that your internal knowledge agent knows how to do, at least you can build a solution where you control both agents. But when your sales agent needs to coordinate with a vendor's fulfillment agent, or when your HR agent needs to communicate with a recruiting platform's sourcing agent, you have no shared protocol. Every integration is a custom bilateral agreement, built by engineers, maintained in perpetuity.

This is the same problem that made SMTP matter. Before email had a protocol, you could send messages within your company's system, but getting a message from one company's system to another's required custom work on both sides. SMTP gave everyone a neutral protocol that made cross-system communication a solved problem. Agent communication at organizational scale needs the same thing.

A2A (Google's Agent-to-Agent protocol) and MCP (Anthropic's Model Context Protocol) are early moves in this direction. They establish neutral, open protocols that any agent can implement. But protocols alone aren't infrastructure. Someone needs to run a network that makes those protocols work reliably in production with uptime guarantees, routing intelligence, and monitoring.

Trust Is Unsolved

When two agents communicate, there's an implicit trust question underneath the transaction: is this agent who it says it is? Is it authorized to make this request? Should I execute the thing it's asking me to do?

In frameworks like LangChain, trust is handled by the application developer. the agents are running in your process, so you trust them by definition. But as soon as you're operating in a multi-tenant, cross-company, or open network environment, trust needs to be established cryptographically, not by assumption.

The internet solved this with certificates. Payments solved it with keys and signatures. Agent networks need something equivalent: a way for an agent to prove its identity and authority, and for the receiving agent to verify that proof without calling home to a central authority on every request.

This isn't a product feature. It's infrastructure. And it has to be there before the cross-company coordination use case becomes viable.

Discovery Doesn't Exist

The final piece of the problem is discovery. Even if agents have addresses and a messaging substrate and a trust layer, they still need a way to find each other. What agents are available to help with a given task? What are their capabilities? What's the right one to route to?

Right now, the answer is: whatever was hardcoded by an engineer six months ago. There's no equivalent of DNS for agents, no registry of what's available, no routing intelligence, no capability-based matching. As agent counts grow and the network expands, this becomes a genuine bottleneck.

The ant colony is still relevant here. Ants don't have a central directory of which ants are doing what. They use pheromone signals; a task in progress emits a signal proportional to how urgently it needs help, and nearby ants respond. The coordination emerges from a fundamental signal instead from a dispatcher. Agent discovery needs to move from hardcoded routing to signal-based, emergent matching.

Why This Problem Is Hard

It's tempting to look at multi-agent coordination and think it's a prompt engineering problem or a UX problem. If agents just communicate in natural language and we add a nice interface on top, it's fine.

This is wrong, and it's wrong in an important way.

Natural language communication between agents is brittle at scale. It's slow. It's expensive in tokens. It doesn't give you the observability or the reliability guarantees you need for production workflows. And it doesn't solve any of the structural problems (addressing, async messaging, trust, discovery) that make coordination hard.

The comparison should rather be skewed towards "ad-hoc versus protocol" instead of "natural language versus structured communication."

The internet didn't scale because people got better at writing emails. It scaled because TCP/IP gave every node on the network a reliable, standardized way to exchange data with every other node, regardless of what was running on either end. The web didn't scale because HTML got more expressive. It scaled because HTTP gave every browser and every server a protocol they could both implement.

Agent coordination will scale the same way with machine-native protocols that have production-grade reliability, instead of just better prompting.

The Opportunity

The coordination layer for agents is underbuilt, and it's underbuilt because the need for it has only become visible in the last eighteen months. Before agents were capable of doing real work, coordination was a research problem. Now it has shifted towards being an operational one.

The wedge is clear: AI-native companies in production today. These are the companies (Cursor, Decagon, Cognition, and their peers) running enough agents that coordination is a genuine pain. They have this issue right now and they're solving it with custom glue. They're the right first customer because they'll tell you exactly what's broken, they'll pay for something that works, and the word-of-mouth in that community is tight.

Per-team pricing gets us to meaningful ACV quickly. The architecture around persistent agent addressing, async inbox, sandboxed workspaces is validated. The design partner conversations that matter are already in scope.

The real opportunity is what happens when cross-company agent communication is a solved problem.

Once agents have stable addresses and a neutral protocol for communicating across organizational lines, the network becomes the product. A personal AI assistant that wants to book a flight doesn't browse Expedia by itself. Instead, it talks directly to the airline's booking agent. A procurement agent doesn't fill out a vendor portal. It negotiates directly with the vendor's fulfillment agent. The entire category of human-mediated B2B and B2C interaction starts to run agent-to-agent, and the coordination layer that makes that possible becomes essential infrastructure for every transaction that touches it.

This is why we think about the long-term positioning as something closer to the internet than to an enterprise software product. The structural dynamic is the same. Once the network is large enough and the protocol is entrenched enough, the value of being on the network is self-reinforcing. Agents join because other agents are there. Companies connect because their counterparties are already connected. The switching cost is the address.

Existing Players

No one is building exactly this. The closest adjacent work is in agent frameworks: LangChain, AutoGen, CrewAI. These are well-adopted and genuinely useful, but they solve coordination inside an application, not across applications or organizations. Their architecture makes it hard to extend to multi-tenant or cross-company use cases. They're also focusing on frameworks, which means their business model doesn't require owning a coordination layer.

There are simulation companies using multi-agent architectures for research: Synthetic Users, Aaru, Sakana AI. While the technology remains the same, these few are solving a radically different problem. Their agents aren't doing production commercial work; they're generating synthetic data or running research experiments. The requirements are different, and they're not trying to become infrastructure.

The hyperscaler question is the one that comes up most often. Why won't Microsoft or Google or Anthropic build this?

Three reasons, all of which we believe hold up.

First, model companies monetize tokens, not coordination. Building a coordination layer would mean competing with the application companies that are their customers and diverting engineering toward infrastructure that doesn't drive token usage. That's not how their business models work.

Second, neutrality matters. The companies that need coordination infrastructure most are running heterogeneous stacks (multiple LLM providers, multiple tooling vendors, multiple internal systems). They don't want their coordination layer owned by one of their model vendors. This is why Twilio existed alongside AWS. Telcos didn't want to route through Amazon.

Third, the protocol companies (Anthropic with MCP, Google with A2A) have explicitly said they want neutral infrastructure companies to run these protocols in production. They can fund standards without trying to own the CDN layer. The same pattern emerges.

Why Now

Three things converged in roughly the same window that make this the right moment.

Agents crossed a capability threshold. Twelve months ago, agents could do demos. Now they're doing real work in production at companies with real revenue. The coordination problem only becomes urgent when agents are capable enough to be worth coordinating. We're there.

Neutral protocols arrived. A2A and MCP give the industry a shared language for agent communication that doesn't belong to any single vendor. Building coordination infrastructure on top of neutral protocols is qualitatively different from building on a proprietary stack. It means we can be Switzerland. We can freely work with every model provider, every framework, and every company, which is the only viable position for infrastructure that needs to be trusted across organizational lines.

The agent count hit the inflection point. There's a threshold somewhere around fifty to a hundred agents per company where custom coordination solutions stop working well. Below that, glue code is fine. Above it, you need real infrastructure. A meaningful number of AI-native companies have crossed that threshold in the last six months. The early ones are starting to feel it.

Eighteen months ago, this product had no real users because no one had enough agents to need it. Eighteen months from now, the hyperscalers will ship bundled answers that work for companies already in their ecosystems. Right now is the window where the category gets defined and the early moat gets built.

Risks

Two risks are real, and it's worth being direct about both.

Market timing. The case for Synq depends on agent proliferation continuing. If the current wave of AI-native companies stalls (regulatory pressure, model capability plateaus, enterprise adoption slower than expected) the coordination problem stays a research problem rather than an operational one. We think this risk is lower than it looks because the companies that already have the pain are real companies in production today, not hypothetical future customers. But it's not zero.

Protocol consolidation. The emergence of A2A and MCP is good news for the category, but if one of those protocols gets abandoned or if the standards fragment badly, building on top of them becomes harder. The mitigation is to own the infrastructure layer rather than the protocol layer. We're agnostic to which protocols win in the same way that cloud providers are agnostic to which languages customers use. But protocol bets are real bets.

The risk we're less worried about is hyperscaler competition. Not because they won't eventually ship something (they will) but because their answer will be for their ecosystem. The companies that need neutral coordination infrastructure most are specifically the ones that don't want to be locked into one vendor's ecosystem. By the time Microsoft ships a bundled agent coordination feature in 2027, the AI-native segment will have picked its infrastructure provider. We want that to be us.

Closing

The premise of Synq is simple: coordination as infrastructure.

Every major transition in computing has required a new coordination layer. The internet required TCP/IP. The web required HTTP. Enterprise software required databases. Mobile required push notification infrastructure. What's different now is that the actors being coordinated are AI agents, the timescale is faster, and the window between "this is a research problem" and "this is a solved problem with dominant players" is shorter than it's ever been.

The coordination layer for the agent economy is going to be built. We intend to build it.

— Shubham & Dorsa