Runtime Skill Audit catches malicious agent skills that fool static checks
Researchers Tu Lan and Chaowei Xiao introduce Runtime Skill Audit (RSA), a dynamic analysis method that detects malicious LLM agent skills at runtime, achieving 90.0% accuracy and outperforming the best static baseline by 13.0 percentage points.
Score breakdown
RSA demonstrates that dynamic, context-targeted auditing catches malicious agent skills that static detectors miss and remain robust under self-evolving adversarial attacks where static methods collapse.
- 01RSA is a dynamic analysis method that audits LLM agent skills under targeted runtime conditions rather than static code review.
- 02Agent skills can appear benign in documentation but become harmful only when invoked with specific user requests, local assets, persistent state, or multi-step tool interactions.
- 03RSA profiles risk-relevant interfaces, prepares execution context, and assigns security labels from behavioral trace evidence.
Tu Lan and Chaowei Xiao identify a security blind spot in LLM agent ecosystems: agent skills — reusable packages of instructions, resources, tools, and workflows — can conceal malicious behavior that only manifests when invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. Because this context-dependent harm is invisible to static code or documentation review, purely static vetting is brittle against sophisticated threats.
To address this, the authors present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by examining what a skill-mediated agent actually does under targeted runtime conditions.
To address this, the authors present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by examining what a skill-mediated agent actually does under targeted runtime conditions. RSA profiles risk-relevant interfaces for each skill, prepares the specific execution context needed to exercise those interfaces, and derives security labels from the resulting behavioral traces — rather than applying the same generic tasks to every skill.
The authors instantiate RSA on OpenClaw and evaluate it against 100 skills and representative static baselines. RSA achieves 90.0% accuracy with an 88.0% true positive rate and an 8.0% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. The robustness advantage is especially pronounced under self-evolving attacks: static detectors collapse after just one or two adversarial rounds, while RSA continues to detect 19–20 out of 20 malicious skills across rounds.
Key facts
- 01RSA is a dynamic analysis method that audits LLM agent skills under targeted runtime conditions rather than static code review.
- 02Agent skills can appear benign in documentation but become harmful only when invoked with specific user requests, local assets, persistent state, or multi-step tool interactions.
- 03RSA profiles risk-relevant interfaces, prepares execution context, and assigns security labels from behavioral trace evidence.
- 04RSA is instantiated on OpenClaw and evaluated on 100 skills against static baselines.
- 05RSA achieves 90.0% accuracy, an 88.0% true positive rate, and an 8.0% false positive rate.
- 06RSA improves accuracy by 13.0 percentage points over the best static baseline.
- 07Under self-evolving attacks, static detectors collapse after one or two rounds; RSA detects 19–20 out of 20 malicious skills across rounds.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →