Jun 3, 2026·1 min readResearch Papers

SePO self-optimizes the prompt agent's own system prompt

Wangcheng Tao, Han Wu, and Weng-Fai Wong propose SePO, a self-referential prompt optimization framework that evolves both task agents' system prompts and the prompt agent's own system prompt, outperforming Manual-CoT by 4.49 accuracy points across five benchmarks.

ArXiv·Wangcheng Tao, Han Wu, Weng-Fai Wong

Read at source

Composite

6.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

SePO demonstrates that the prompt agent itself — not just the tasks it serves — can be a target of automated optimization, removing a hand-engineered bottleneck that prior prompt optimization methods left unaddressed.

01SePO treats the prompt agent's own system prompt as an optimization target, not just the task agents' system prompts.
02A self-referential design lets a single prompt agent improve both its own system prompt and those of task agents.
03Optimization uses an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones.

Summary— our read of the original

Wangcheng Tao, Han Wu, and Weng-Fai Wong introduce Self-Evolving Prompt Optimization (SePO), a framework that closes a gap left open by existing prompt optimization approaches. Prior methods deploy a dedicated prompt agent to iteratively refine the system prompts of task agents, but treat the prompt agent's own system prompt as a static, hand-engineered artifact. SePO reframes this as a joint optimization problem: a single prompt agent simultaneously evolves task agents' system prompts and its own, using a self-referential loop.

The optimization proceeds through an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones, avoiding premature convergence.

The optimization proceeds through an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones, avoiding premature convergence. Training is split into two stages: a pre-training phase that evolves the prompt agent across a multi-task pool, followed by a fine-tuning phase that specializes it for a target task. Crucially, the paper reports that the prompt optimization skill acquired during pre-training generalizes to tasks outside the pre-training mixture, rather than simply memorizing per-task prompts.

Evaluated across five benchmarks — math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku) — SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO. The average accuracy improvement over Manual-CoT is 4.49 points. Because the approach optimizes human-readable, model-agnostic instructions without modifying the underlying model weights, it remains broadly applicable across different model backends.

Key facts

01SePO treats the prompt agent's own system prompt as an optimization target, not just the task agents' system prompts.
02A self-referential design lets a single prompt agent improve both its own system prompt and those of task agents.
03Optimization uses an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones.
04Training has two stages: pre-training on a multi-task pool, then fine-tuning on a target task.
05SePO outperforms Manual-CoT, TextGrad, and MetaSPO across all five benchmarks tested.
06Benchmarks span AIME'25, ARC-AGI-1, GPQA, MBPP, and Sudoku.
07Average accuracy improvement over Manual-CoT is 4.49 points, and the skill generalizes beyond the pre-training task mixture.

Topics

#prompt-engineering #agent-framework #benchmarks #reasoning #code-generation

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →

SePO self-optimizes the prompt agent's own system prompt

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.