Skill quality — install & operations¶
One page on running the Phase 3 quality scorer: install the Stop hook, seed the sidecars, and verify the data flows into the wiki and the knowledge graph.
What it does¶
Every installed skill and agent gets a continuous quality score in
[0.0, 1.0] plus an A/B/C/D/F letter grade, derived from four signals:
| Signal | Weight | What it measures |
|---|---|---|
| telemetry | 0.40 | Load count, recency, freshness in skill-events.jsonl |
| intake | 0.20 | Live re-run of the six install-time structural checks |
| graph | 0.25 | Degree + average edge weight in the wiki graph |
| routing | 0.15 | Router hit-rate (neutral prior below 3 observations) |
Two hard floors override the weighted sum:
intake_fail— any structural check is currently failing → grade F. Grade F is only produced by this hard floor; it is never returned by the score-to-grade mapping function alone. A score of 0.0 maps to D.never_loaded_stale— no load events ever → grade capped at D.
The score is mirrored to three on-disk sinks so every consumer sees the same number:
~/.claude/skill-quality/<slug>.json— canonical machine-readable form.- Wiki entity frontmatter —
quality_score,quality_grade,quality_updated_at. - Wiki body — a
## Qualityblock between<!-- quality:begin -->markers.
The knowledge-graph node attribute is a separate consumer path, not a
write path from persist_quality: wiki_graphify reads the sidecar JSON
produced by sink 1 and attaches quality_score / quality_grade to each
node on its next build.
Register the Stop hook¶
The hook runs once per session-end. It reads skill-events.jsonl since
its last run, collects every slug that appeared, and calls
skill_quality.py recompute --slugs <comma-list> — so scoring is
incremental (touched skills only), not a full 2,000-page sweep.
Edit ~/.claude/settings.json and add, replacing <REPO> with the
absolute path to this checkout (use forward slashes on Windows):
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "python <REPO>/hooks/quality_on_session_end.py"
}
]
}
]
}
}
The hook always exits 0: a scoring error will not block session shutdown.
Seed the sidecars (first run only)¶
Run once after install so every installed skill has a baseline score:
This walks ~/.claude/skills/*/SKILL.md and ~/.claude/agents/*.md,
scores each, and writes the three on-disk sinks. Expect ~15–30s depending on
corpus size and disk.
CLI reference¶
# Full recompute (use sparingly; the Stop hook handles incrementals).
ctx-skill-quality recompute --all
# One slug.
ctx-skill-quality recompute --slug python-testing
# Show the most recent score.
ctx-skill-quality show python-testing
# Signal-by-signal breakdown with evidence.
ctx-skill-quality explain python-testing
# List every slug with its grade, filtered.
ctx-skill-quality list --grade D
All verbs accept --json for piping into other tools.
Graph integration¶
wiki_graphify.py reads the sidecar directory automatically and
attaches quality_score and quality_grade to every matching node. The
Obsidian graph view can then color nodes by grade — configure the
quality_grade property in Obsidian's graph settings.
Nodes without a sidecar get quality_score: null and quality_grade:
null so downstream consumers can always read the attribute safely.
Configuration¶
All knobs live in src/config.json under the top-level quality key:
{
"quality": {
"weights": {
"telemetry": 0.40, "intake": 0.20, "graph": 0.25, "routing": 0.15
},
"agent_weights": {
"telemetry": 0.15, "intake": 0.30, "graph": 0.35, "routing": 0.20
},
"grade_thresholds": {"A": 0.80, "B": 0.60, "C": 0.40},
"stale_threshold_days": 30.0,
"recent_window_days": 14.0,
"min_body_chars": 120,
"paths": {
"sidecar_dir": "~/.claude/skill-quality",
"router_trace": "~/.claude/router-trace.jsonl"
}
}
}
ctx_config.Config exposes this through cfg.get("quality", {}). User
overrides in ~/.claude/skill-system-config.json deep-merge over the
repo defaults, so you can pin only the keys you want to change.
Both weight vectors must sum to 1.0 (±0.01) and grade thresholds must
satisfy 0 ≤ C ≤ B ≤ A ≤ 1 — QualityConfig.__post_init__ will raise
on bad values, catching typos before they pollute sidecars.
Why two weight vectors¶
Skills and agents differ in how they're invoked. Skills are loaded automatically by the router, so telemetry (load counts, recency) is the strongest post-install quality signal — hence 0.40 weight.
Agents are invoked via the Agent tool, deliberately and rarely. A
seldom-used agent isn't stale, it's specialized. The agent weights
shift mass onto graph connectedness (0.35) and intake structure
(0.30) so agents aren't penalized for having an empty telemetry stream.
The never_loaded_stale hard floor, which caps skills at D when they
have zero load events, does not apply to agents for the same
reason.
Troubleshooting¶
- Every skill grades D. Telemetry hasn't accumulated enough load events yet. This is expected on a fresh install; the stop-hook will pick up real usage over the next few sessions.
- A recently-edited skill now grades F. Open the sidecar and look
at
signals.intake.evidence.checks— one of the six structural checks is failing. Fix the file and rerunrecompute --slug <name>. - Wiki page has two
## Qualitysections. Shouldn't happen —persist_qualityis idempotent via the HTML-comment markers. If it does, delete both blocks and rerunrecompute; the first pass will re-emit exactly one. - Graph view shows no color. Run
python -m wiki_graphify --graph-onlyto rebuild; it reads sidecars fresh on every build.