Cognition shares hard lessons from building cloud agents
Cognition's team details the deep infrastructure challenges — from VM-level isolation to hypervisor-level snapshotting — they encountered over two years of building Devin as a production cloud agent.
Score breakdown
Teams evaluating whether to build their own cloud agent infrastructure should weigh that Cognition spent over a year on hypervisor engineering alone — before tackling orchestration, governance, and integrations — suggesting the build-vs-buy calculus is far more demanding than high-profile posts from companies like Stripe imply.
- 01The post is authored by The Cognition Team and draws on over two years of building Devin as a cloud agent.
- 02Containerized agents share a kernel, making kernel-level escape a real security threat when agents run arbitrary code.
- 03The industry consensus for untrusted code execution is VM-level isolation, giving each workload its own dedicated kernel.
The Cognition Team's blog post frames cloud agents as the emerging future of enterprise software engineering, but argues that the infrastructure required to run them reliably is routinely underestimated. The natural starting point — containerizing a CLI agent and connecting it to repos and tooling — surfaces three serious problems quickly: containers share a kernel, creating a real attack surface for kernel-level escapes when agents run arbitrary code; containers cannot snapshot and restore full working state across the async gaps (minutes, hours, or days) that define real engineering work like waiting on CI or code review; and scaling to hundreds of concurrent sessions demands orchestration, governance, and integration work that each constitute multi-quarter engineering projects on their own.
Cognition's solutions to the first two problems required over a year of hypervisor engineering.
Cognition's solutions to the first two problems required over a year of hypervisor engineering. Each Devin session runs in a dedicated microVM with its own kernel, fully isolated storage, networking, and compute — the industry-consensus approach for untrusted code execution. To handle async persistence, the team snapshots full machine state at the hypervisor level (memory, process trees, and filesystem), shuts down compute while the agent is idle, and restores the session exactly when a trigger like a CI result or review comment arrives. The post describes making this work reliably across thousands of concurrent sessions as the hardest infrastructure problem the team has faced.
On the orchestration and governance side, the post outlines the challenges of provisioning the right environment per session, routing correctly, maintaining warm VM pools as codebases change daily, chaining engineer identity and permissions across every system an agent touches, and maintaining tamper-evident audit logging at enterprise scale. As a concrete cautionary example, the post references a "leading cloud data platform company" that attempted to build this infrastructure internally and moved on after the project scope overwhelmed their infrastructure team. The source text is truncated before the integrations section is completed.
Key facts
- 01The post is authored by The Cognition Team and draws on over two years of building Devin as a cloud agent.
- 02Containerized agents share a kernel, making kernel-level escape a real security threat when agents run arbitrary code.
- 03The industry consensus for untrusted code execution is VM-level isolation, giving each workload its own dedicated kernel.
- 04