Should you install this skill?
Type a skill name. We'll show you whether it measurably helps the agent — and whether it triggers exploits in a runtime sandbox.
227 audited · 41 unsafe · 93 confirmed exploits · 4,256 judge items
30-second tour — how SkillAudit evaluates a skill before you install it.
Riskiest skills you should know about.
93 confirmed exploits across 41 skillsTwelve hand-picked findings spanning five exploit classes. Click any card to inspect that skill.
How we score a skill
5-step pipeline · four independent axes
Every skill goes through the same pipeline. The same execution pass produces four independent axes — utility, efficiency, cost, and safety — never combined into a single score.
SKILL.md, scripts, and dependencies. Each finding gets an existence_confidence ∈ [0, 1].
→ static_scan.json
scenarios/U*.yaml
fs.diff · net.log
existence × exploitability against the runtime trace.
→ judges/*.json
utility = mean PRG over valid pairs; efficiency = mean (two − twi) / two; cost = mean (q̃wo − q̃wi) / q̃wo, q̃ = input − cache; safety = max(10, 100 − Σ base × existence × exploit).
→ skill_report.json
Browser extension
Released v0.3.0 · load unpacked from browser_extension/
The Chromium MV3 extension recognizes any GitHub repository whose root contains a
SKILL.md — and the four major skill marketplaces
(clawhub.ai, skills.sh, skillsmp.com,
ai-skills.io) — and renders the same verdict directly on the page,
at the moment someone is deciding whether to install.
// captured during precomputed run · 2026-05-04 · run d4f8c2
Cite + download
BibTeX · benchmark.json (4.5 MB) for reproducibility
BibTeX
@misc{skillaudit2026,
title = {SkillAudit: From Task-First Evaluation to
Skill-Centered Assessment of Agent Skill Packages},
author = {SkillAudit Contributors},
year = {2026},
eprint = {TBD},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
note = {Project page: \url{https://skillaudit.github.io/}}
}
Reproduce
Every audit's skill_report.json — judge items, finding rationale, paired
wi / wo numbers, severity weighting — bundled for all 227 evaluated skills.