Notion's Custom Agents journey: evals, org design, and agentic work
Sarah Sachs and Simon Last of Notion join the Latent Space podcast to discuss how Notion built Custom Agents over multiple years and rebuilds, covering evals philosophy, org structure, and a long-term vision for "software factories" made of agents.
Score breakdown
Teams building agentic products can apply Notion's hard-won lessons — on eval design, roadmap timing relative to model capabilities, and org structure — to avoid the same multi-year rebuild cycles Notion experienced.
- 01Notion's Custom Agents launch was the company's most successful ever in terms of free trials and conversions.
- 02Early agent experiments in 2022 failed due to no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model.
- 03Notion's 'Agent Lab' thesis: application companies must build the right product system around frontier capabilities, not just wrap a model.
Sarah Sachs and Simon Last of Notion joined the Latent Space podcast (hosted by Alessio of Kernel Labs and Swyx, editor of Latent Space) to go deep on how Notion built its Custom Agents feature, a process that took years and multiple rebuilds. The launch was described as Notion's most successful ever in terms of free trials and conversions. Early agent experiments dating to 2022 failed due to the absence of a tool-calling standard, short context windows, unreliable models, and too much complexity being exposed directly to the model. The guests articulate an "Agent Lab" thesis for application companies: rather than simply wrapping a frontier model, the real work is understanding how people collaborate and constructing the right product system around model capabilities. Notion's roadmap philosophy is to avoid swimming upstream against model limitations while still building early enough that the product is ready when models catch up.
Notion organizes its AI work across core AI capabilities and infrastructure, product packaging teams, and a company-wide mandate that every product surface must increasingly serve both humans and agents.
The conversation covers Notion's approach to AI engineering culture — Sarah describes objective-setting over idea ownership, low-ego teams comfortable deleting their own work, and a structure designed to swarm around fast-changing opportunities. Notion organizes its AI work across core AI capabilities and infrastructure, product packaging teams, and a company-wide mandate that every product surface must increasingly serve both humans and agents. On evals, Notion runs regression tests, launch-quality evals, and "frontier/headroom" evals that intentionally pass only ~30% of the time, allowing the team to track where model capabilities are heading. A dedicated "Model Behavior Engineer" role treats eval writing and failure analysis as a first-class discipline. The guests also discuss MCP support — noting that MCP is well-suited for narrow, lightweight, tightly permissioned agent use cases — and a long-term vision for "software factories" composed of agents that spec, code, test, debug, review, and maintain codebases together.
Key facts
- 01Notion's Custom Agents launch was the company's most successful ever in terms of free trials and conversions.
- 02Early agent experiments in 2022 failed due to no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model.
- 03Notion's 'Agent Lab' thesis: application companies must build the right product system around frontier capabilities, not just wrap a model.
- 04