SAGE framework treats prompt optimization as black-box search
Researchers introduce SPO (Stochastic Prompt Optimization) and its most advanced variant SAGE, a multi-agent pipeline that treats automatic prompt optimization as black-box search rather than gradient-based tuning.
Score breakdown
The work demonstrates that agentic, multi-agent prompt optimization can compound noisy real-world A/B test cycles into statistically robust improvements, offering a practical alternative to gradient-based prompt tuning for open-ended task-oriented dialogue systems.
- 01The paper introduces SPO (Stochastic Prompt Optimization), a framework for stochastic search over prompt space.
- 02Motivation comes from prior work showing textual gradients do not function as real gradients, justifying a black-box search approach.
- 03Three strategies are compared: error-informed random search, a genetic algorithm with evolutionary operators, and SAGE.
The paper frames context engineering — improving AI systems without parameter updates — as a black-box search problem, motivated by prior work showing that textual gradients do not behave like real gradients. To address this, the authors introduce SPO (Stochastic Prompt Optimization), a framework that explores prompt space stochastically, and evaluate three strategies of increasing sophistication: error-informed random search, a genetic algorithm with evolutionary operators, and SAGE (SPO via Agent-Guided Exploration), a multi-agent pipeline that incorporates diagnostic code execution.
The authors argue that coupling qualitative diagnosis with quantitative validation is the key ingredient that makes agentic optimization effective for open-ended, task-oriented dialogue.
Benchmarking across three tasks reveals that no single strategy universally dominates; the relative effectiveness of each approach depends on the interaction between the prompt landscape's structure and the type of errors encountered. Beyond benchmarks, the paper deploys SAGE in a real-world continuous optimization setting on a mental-health chatbot, where it compounds eight cycles of individually-noisy A/B tests into a statistically robust gain in next-day retention. The authors argue that coupling qualitative diagnosis with quantitative validation is the key ingredient that makes agentic optimization effective for open-ended, task-oriented dialogue.
Key facts
- 01The paper introduces SPO (Stochastic Prompt Optimization), a framework for stochastic search over prompt space.
- 02Motivation comes from prior work showing textual gradients do not function as real gradients, justifying a black-box search approach.
- 03Three strategies are compared: error-informed random search, a genetic algorithm with evolutionary operators, and SAGE.
- 04SAGE (SPO via Agent-Guided Exploration) is a multi-agent pipeline with diagnostic code execution.
- 05Across three benchmarks, no single strategy dominates — effectiveness depends on landscape structure and error type.
- 06SAGE was deployed on a mental-health chatbot under a continuous optimization paradigm.
- 07Eight cycles of individually-noisy A/B tests were compounded into a statistically robust gain in next-day retention.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →