VESTA framework auto-generates 1,072 safety scenarios for LLM agents
Researchers introduce VESTA, a fully automated framework that generates and evaluates safety scenarios for LLM agents, finding an average attack success rate of 47.1% across 12 tested agents.
Score breakdown
The study demonstrates that current LLM agents face substantial behavioral safety risks during task execution — with an average ASR of 47.1% and some models exceeding 70% — underscoring the inadequacy of static, output-only evaluation methods for agents operating with memory, tools, and environmental access.
- 01VESTA is a fully automated scenario generation and safety evaluation framework for LLM agents.
- 02The framework covers five risk dimensions and produces 1,072 measurable evaluation scenarios.
- 0312 LLM agents were evaluated under two authority contexts using VESTA's automated pipeline.
Lu Jia, Haibo Tong, and Feifei Zhao present VESTA, a fully automated framework designed to address gaps in how LLM agent safety is currently evaluated. As LLM agents grow beyond simple text interaction to encompass memory, tool use, external environment access, and autonomous task execution, the safety risks they face have become more varied and harder to capture with traditional evaluation methods. VESTA addresses this by automatically generating diverse, real-world-grounded evaluation scenarios rather than relying on manually written prompts or static test sets.
The framework organizes safety risks along five dimensions and translates them into 1,072 concrete, measurable scenarios.
The framework organizes safety risks along five dimensions and translates them into 1,072 concrete, measurable scenarios. Using an automated evaluation pipeline, the authors tested 12 LLM agents under two authority contexts. Results revealed that current agents carry significant behavioral safety risks during task execution, with an average ASR of 47.1% and several models surpassing 70%. The authors argue these findings underscore the need for executable, process-level evaluation — rather than final-output-only judgment — to properly understand and improve LLM agent safety.
Key facts
- 01VESTA is a fully automated scenario generation and safety evaluation framework for LLM agents.
- 02The framework covers five risk dimensions and produces 1,072 measurable evaluation scenarios.
- 0312 LLM agents were evaluated under two authority contexts using VESTA's automated pipeline.
- 04The average attack success rate (ASR) across tested agents was 47.1%.
- 05Several models exceeded an ASR of 70%.
- 06Existing evaluations rely on manually written scenarios, static prompts, or final-output judgments, which VESTA aims to replace.
- 07The paper argues for executable, process-level evaluation to better understand LLM agent safety.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →