Apr 15, 2026·1 min readApplications & Use Cases

Notion's Custom Agents journey: evals, org design, and agentic work

Sarah Sachs and Simon Last of Notion join the Latent Space podcast to discuss how Notion built Custom Agents over multiple years and rebuilds, covering evals philosophy, org structure, and a long-term vision for "software factories" made of agents.

YouTube: Latent Space·Latent Space

Read at source

Composite

6.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Teams building agentic products can apply Notion's hard-won lessons — on eval design, roadmap timing relative to model capabilities, and org structure — to avoid the same multi-year rebuild cycles Notion experienced.

01Notion's Custom Agents launch was the company's most successful ever in terms of free trials and conversions.
02Early agent experiments in 2022 failed due to no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model.
03Notion's 'Agent Lab' thesis: application companies must build the right product system around frontier capabilities, not just wrap a model.

Summary— our read of the original

Sarah Sachs and Simon Last of Notion joined the Latent Space podcast (hosted by Alessio of Kernel Labs and Swyx, editor of Latent Space) to go deep on how Notion built its Custom Agents feature, a process that took years and multiple rebuilds. The launch was described as Notion's most successful ever in terms of free trials and conversions. Early agent experiments dating to 2022 failed due to the absence of a tool-calling standard, short context windows, unreliable models, and too much complexity being exposed directly to the model. The guests articulate an "Agent Lab" thesis for application companies: rather than simply wrapping a frontier model, the real work is understanding how people collaborate and constructing the right product system around model capabilities. Notion's roadmap philosophy is to avoid swimming upstream against model limitations while still building early enough that the product is ready when models catch up.

Notion organizes its AI work across core AI capabilities and infrastructure, product packaging teams, and a company-wide mandate that every product surface must increasingly serve both humans and agents.

The conversation covers Notion's approach to AI engineering culture — Sarah describes objective-setting over idea ownership, low-ego teams comfortable deleting their own work, and a structure designed to swarm around fast-changing opportunities. Notion organizes its AI work across core AI capabilities and infrastructure, product packaging teams, and a company-wide mandate that every product surface must increasingly serve both humans and agents. On evals, Notion runs regression tests, launch-quality evals, and "frontier/headroom" evals that intentionally pass only ~30% of the time, allowing the team to track where model capabilities are heading. A dedicated "Model Behavior Engineer" role treats eval writing and failure analysis as a first-class discipline. The guests also discuss MCP support — noting that MCP is well-suited for narrow, lightweight, tightly permissioned agent use cases — and a long-term vision for "software factories" composed of agents that spec, code, test, debug, review, and maintain codebases together.

Key facts

01Notion's Custom Agents launch was the company's most successful ever in terms of free trials and conversions.
02Early agent experiments in 2022 failed due to no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model.
03Notion's 'Agent Lab' thesis: application companies must build the right product system around frontier capabilities, not just wrap a model.
04Notion runs 'frontier/headroom' evals that intentionally pass only ~30% of the time to track where model capabilities are heading.
05Notion has a dedicated 'Model Behavior Engineer' role that treats eval writing and failure analysis as a first-class discipline.
06Notion supports MCP, viewing it as well-suited for narrow, lightweight, tightly permissioned agent use cases.
07Notion's long-term vision includes 'software factories' — groups of agents that spec, code, test, debug, review, and maintain codebases together.

Topics

#agent-framework #evals #mcp #enterprise-agents #product-strategy

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 21, 2026 · 18:16 UTC. How this works →

Notion's Custom Agents journey: evals, org design, and agentic work

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics