SkillVetBench uses LLM-as-Judge to catch agent skill threats static scanners miss
Researchers introduce SKILLVETBENCH, a live Hugging Face leaderboard that uses an LLM-as-Judge and a new five-dimensional risk metric (SARS) to vet open-source LLM agent skills, achieving zero false negatives across 78 confirmed-malicious skills where the best static baseline still misses 15%.
Score breakdown
Existing code-layer scanners miss between 89% and 100% of instruction-layer threats like Prompt Injection and Memory Poisoning in LLM agent skills, and SKILLVETBENCH's LLM-as-Judge approach closes that gap with zero false negatives across 78 confirmed-malicious skills in benchmark testing.
- 01SKILLVETBENCH is a live public leaderboard on Hugging Face that uses an LLM-as-Judge to vet open-source LLM agent skills.
- 02The paper introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems.
- 03Full CVSS v4.0 vector decomposition is integrated, along with a ClawHub dual-view comparing LLM-generated reviews with official marketplace verdicts.
Ismail Hossain, Sai Puppala, and Md Jahangir Alam present SKILLVETBENCH to address a critical gap in open-source LLM agent security: existing scanners operate at the code layer and are structurally blind to instruction-layer and multi-agent risks, including natural-language directives that hijack agents, exfiltrate data through encoded side channels, or chain harm across pipelines. The paper argues that what is needed is a semantic, multi-dimensional vetting system rather than another signature matcher, and delivers this as a live public leaderboard on Hugging Face using an LLM-as-Judge to vet agent skills.
The system introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula designed for instruction-following systems.
The system introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula designed for instruction-following systems. It also integrates full CVSS v4.0 vector decomposition and a ClawHub dual-view that places the LLM-generated review alongside the official marketplace verdict. Drawing on a companion benchmark paper, the LLM-as-Judge stage achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls. By contrast, the best static baseline (SKILLSIEVE) still misses 15% of threats, and for instruction-layer categories such as Prompt Injection and Memory Poisoning, conventional tools miss between 89% and 100% — with CODEBERT detecting none of nine memory-poisoning skills. Detection rates vary from 35% to 95% across four LLM evaluators, which the authors cite as motivation for ensemble scoring in production deployments.
Key facts
- 01SKILLVETBENCH is a live public leaderboard on Hugging Face that uses an LLM-as-Judge to vet open-source LLM agent skills.
- 02The paper introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems.
- 03Full CVSS v4.0 vector decomposition is integrated, along with a ClawHub dual-view comparing LLM-generated reviews with official marketplace verdicts.
- 04The LLM-as-Judge stage achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls.
- 05The best static baseline, SKILLSIEVE, still misses 15% of threats; conventional tools miss between 89% and 100% of instruction-layer threats like Prompt Injection and Memory Poisoning.
- 06CODEBERT detects none of nine memory-poisoning skills.
- 07Detection rates vary from 35% to 95% across four LLM evaluators, motivating ensemble scoring in production deployments.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →