Skill Benchmark

Score a SKILL.md file across quality and compatibility dimensions, then return a ranked fix list for improvement before publishing.

What Does It Check?

The skill performs purely static analysis, no agents are installed or called. It reads the target SKILL.md, applies the scoring rubric from references/rubric.md, checks the skill against agent compatibility signals from references/agent-profiles.md, and produces the report using references/report-template.md.In scope:

Any SKILL.md in the local project or at a specified path
All 5 scoring dimensions (trigger quality, instruction clarity, agent-agnostic design, self-containment, output definition)
Compatibility with 54 named AI agents across 7 profile types
Auto-fix of the top-ranked issue on request

Out of scope:

Runtime testing of the skill against a live agent this is static analysis only
Scoring SKILL.md files for frameworks other than the Rifteo skills format

How It Works

Step 0: Locate the Target SkillSearch the current directory, then .claude/skills/, .agents/skills/, .cline/skills/, and global skill paths. If multiple files are found, list them and ask the user to confirm.Step 1: IntakeExtract the skill name, description, body line count, presence of bundled resources (scripts/, references/), and any explicit agent target mentions.Step 2: Score Each DimensionApply the rubric from references/rubric.md. For each of the 5 dimensions, record the score, a one-sentence rationale, and the single most impactful fix available.Step 3: Run Compatibility MatrixCheck the skill against each of the 7 agent profiles for hard-fail and soft-fail signals. Assign the worst result across all applicable profiles per named agent (FAIL overrides WARN overrides PASS).Step 4: Generate the ReportFill in the full report template: total score, per-dimension breakdown, compatibility matrix (54 agents grouped into Compatible / Partial / Broken), and ranked fix list.Step 5: Offer Next StepsAfter presenting the report, offer three options: auto-fix the top-ranked issue and re-score; deep-dive into any dimension with full evidence; or export the report as skill-benchmark-report.md.

Output

Score	Meaning
70–100	Publishable, skill is ready for the community repo
50–69	Needs work some agents will degrade or skip steps
< 50	Not publishable will silently fail on most agents

Example output structure:

Skill: my-skill
Total score: 74/100

Dimension scores:
 Trigger quality:    20/25
 Instruction clarity:  22/25
 Agent-agnostic design: 14/20
 Self-containment:   10/15
 Output definition:   8/15

Compatibility:
 Compatible (38 agents): Claude Code, Cursor, ...
 Partial (12 agents):  Windsurf, ...
 Likely broken (4 agents): ...

Top fixes (ranked):
 1. [Output definition] Add an explicit output format section: ...
 2. [Self-containment] Reference is missing: scripts/cvss-scorer.py ...
 3. [Agent-agnostic] Remove IDE-specific path assumption ...

Known Limitations

Scoring is strict: 70 is the publishable threshold, not 60 or 65
Compatibility results reflect static signals; a skill rated Partial may still work on some agents depending on their current version
The auto-fix option applies only the single top-ranked fix per run re-run to address the next issue

Overview

Attack Mindset

Web Application

AI / LLM Security

API Security

Infrastructure

Reconnaissance

Reporting

Compliance

Workflow

Integrations

Skill Benchmark

Summary

SKILL.md file

Skill Benchmark

What Does It Check?

How It Works

Output

Known Limitations

find-skills

finding-writer

compliance-gap-analyzer

​Summary

​SKILL.md file

​Skill Benchmark

​What Does It Check?

​How It Works

​Output

​Known Limitations

​Related skills

find-skills

finding-writer

compliance-gap-analyzer

Summary

SKILL.md file

Skill Benchmark

What Does It Check?

How It Works

Output

Known Limitations

Related skills