Every Problem CNCF Solved for Microservices Is Back, Harder, for AI Agents
As web applications grew complex, we extracted layers. Persistence moved to databases, speed to caches, asynchronous work to queues, and each time, the infrastructure had to be rebuilt around the new pattern. Monoliths became microservices, and what followed was a decade of cloud native infrastructure closing gaps the application pattern had opened.
Most AI today operates at the edge: chatbots, copilots, research assistants where a human absorbs ambiguity and decides what to trust. That pattern taps knowledge and light reasoning, but leaves harder problems unaddressed. The deeper shift begins when reasoning moves inside the backend, when AI stops serving a person and starts serving other software. These are services that evaluate, decide, and act within the runtime, and every solution we have built rests on an assumption they violate: that workloads are deterministic.
Failure Without Symptoms
Our monitoring infrastructure assumes that when something goes wrong, it looks wrong: a 500 status code, a timeout, a breaker trip.
When an agent reasons badly, none of that happens. It produces a confident, plausible, wrong output. The next agent in the chain consumes that output as valid input and reasons on top of it. The error compounds silently while dashboards show green. The system did not fail in any way the infrastructure recognizes; it decided badly.
A thousand agents making individually rational decisions can produce collectively irrational outcomes, because no individual agent sees the portfolio-level correlation risk that emerges from their collective behavior.
The End of Bounded Execution
Every timing contract in cloud native infrastructure assumes bounded, predictable execution. Circuit breakers trip on timeouts, retry logic assumes idempotency, and rate limiters assume roughly uniform cost per request.
Agents violate all three. A reasoning step that would have been a 200-millisecond API call now takes seconds to minutes, because it iterates through possibilities rather than executing a fixed path. Circuit breakers that interpret slow responses as failures will kill reasoning mid-thought. Retrying a reasoning step does not produce the same answer, because idempotency does not apply to judgment; the model weighs context differently on each pass, and "try again" means "think again," which produces a different conclusion rather than replaying a deterministic function.
Accountability Beyond Access Control
When a microservice acts, you verify its identity at the point of action and log what it did. Identity and observability are separate concerns, handled by separate infrastructure. For reasoning systems, they collapse into a single question: can you prove who authorized this decision, what the agent considered, and why it chose this path?
Current observability captures what happened, but reasoning systems also need to capture what was thought: which factors were weighed, which alternatives were discarded, what confidence the agent assigned to its conclusion. A trace follows a call graph; a deliberation weighs competing considerations, and the relationships between them matter more than their sequence.
Governance at Decision Speed
Cloud native governance is typically layered on after deployment: review what happened, correct it before the next cycle. That contract breaks when agents make thousands of decisions per minute. The volume overwhelms human review and consequences propagate before any review could occur. Governance has to become a runtime constraint, built into the infrastructure so that it shapes what agents can decide before they decide it.
This shift also reveals the need for primitives with no equivalent in today's stack. Caches have TTLs that tell you when data expires; decisions have no equivalent concept, even though they need one more urgently. A routing decision made at 2pm based on current inventory may be obsolete by 3pm, but nothing marks it as stale.
What Follows
When reasoning moves from the edge into the runtime, the infrastructure must follow. Frameworks are useful for prototyping a single agent, but a fleet of agents making decisions across a production backend, coordinating and delegating at machine speed, is not a framework problem. It is an infrastructure problem, and the cloud native community has built this kind of infrastructure before. The difference is that the workloads now reason, and every assumption we built on was designed for a world where software did what it was told.