ctx — Skill, Agent, MCP & Harness Recommendations¶

ctx is not an Amazon-style catalog of skills, MCPs, agents, tools, or harnesses. It is a recommendation layer. Point it at your organization's own tools, or use the pre-built graph, and ctx recommends the smallest useful bundle for the current development window. The goal is to load the right skills, agents, MCP servers, and optional harness at the right moment so hosted LLMs burn fewer tokens and local models waste less CPU/GPU work.

Watches what you develop, walks a knowledge graph of 68,494 skill pages, 467 agents, 10,790 MCP servers, and 207 cataloged harnesses, and recommends the right execution bundle on the fly. The live execution bundle is skills, agents, and MCP servers only; custom/API/local model users and external loop adapters get separate harness recommendations after explicit user-owned model consent, ranked by model choice and task goal. You decide what to load, install, or adopt. Powered by a Karpathy LLM wiki with persistent memory that gets smarter every session.

Install

pip install claude-ctx
ctx-init --graph --model-mode skip

Optional extras: pip install "claude-ctx[embeddings]" for the semantic backend, pip install "claude-ctx[harness]" for local/API model harness runs, pip install "claude-ctx[dev]" for the pytest/mypy/ruff toolchain. After install the ctx, ctx-scan-repo, ctx-skill-quality, ctx-skill-health, and ctx-toolbox console scripts are on PATH; python -m ctx --help reaches the same run/resume/sessions CLI as ctx. ctx-init --graph installs the fast pre-built runtime graph that powers recommendations and harness dry-runs; source checkouts use graph/wiki-graph-runtime.tar.gz, while pip installs download the matching GitHub release asset. Use ctx-init --graph --graph-install-mode full when you want the full packed LLM-wiki installed locally.

Custom-model users can run ctx-init --model-mode custom --model <provider/model> --goal "<task>" to record the model profile and surface harness recommendations.

Before pushing

scripts/no_mistakes_run.sh fast --profile smoke
scripts/no_mistakes_run.sh fast
scripts/no_mistakes_run.sh gate --intent "narrow task statement for this branch"

scripts/no_mistakes_run.sh fast --profile smoke is the quick first pass: it keeps cheap invariants, no-test policy, ruff, and public docs tracker checks while deferring slow unit, package, graph, browser, similarity, telemetry, and strict docs lanes. scripts/no_mistakes_run.sh fast is the full fast front door before no-mistakes or PR: it selects the same PR checks, splits independent work into isolated temporary worktree lanes, and runs them in parallel against committed branch history. The wrapper writes lane timing evidence to .gate/local-fast.json by default, and lane filters support fast reruns such as scripts/no_mistakes_run.sh fast --lane static --lane unit. Unit-family checks run as separate unit, canary, contract, and clean-host lanes so the broad coverage pass no longer serializes the canary and clean-host smoke checks. scripts/no_mistakes_run.sh gate --intent ... refuses implicit/stale intent, runs smoke + full local-fast first, then starts no-mistakes with the explicit branch objective. The serial preflight/no-mistakes path remains available when you need to inspect individual checks. Preflight uses the same changed-file classifier as GitHub Actions and runs the matching local gates before you open a PR: stats, ruff format/check, mypy, pip check, unit coverage, canaries, package build, twine, docs, graph validation, browser, and similarity checks as needed. When graph artifacts are still Git LFS pointers, preflight hydrates only the required tarballs, verifies their pointer SHA-256 and size caps, then validates the artifacts. Use --profile full before release work to force the source/package gates even for docs-only or graph-only changes. Docs changes run public docs tracker checks before the strict MkDocs build, including bug-smoke, feature, dashboard, and toolbox coverage. Always pass an explicit narrow no-mistakes intent so review/test/doc agents validate this branch instead of inferring a stale broader goal from local transcripts. Public docs surfaces are release-tracked: when mkdocs.yml adds, removes, or moves a nav .md page, or public linked assets under docs/assets/javascripts/, docs/services/, or docs/toolbox/templates/ change, update the relevant supporting ledger (docs/qa/feature-user-story-status.csv or docs/qa/dashboard-user-story-status.csv) and the canonical qa/feature_status.csv with the exact path in entrypoint_or_route. Bug-smoke audit rows live in qa/bug_smoke_status.csv and are validated by the same public docs tracker; Retested Pass rows must include PASS: retest evidence and a closed next_action starting with Closed;.

Why this exists¶

Claude Code skills, agents, MCP servers, and model harness profiles are powerful, but at scale they become unmanageable:

Discovery problem — with 68,494 skill pages, 467 agents, 10,790 MCP servers, and 207 harnesses, how do you know which ones exist and which are relevant to your current project?
Context budget — loading every installable entity wastes tokens and degrades quality. You need exactly the right skills, agents, and MCP servers per session, plus a harness recommendation only when you choose a custom/API/local model path.
Hidden connections — a FastAPI skill is useful, but you also need the Pydantic skill, the async Python patterns skill, and the Docker skill, plus possibly a matching MCP server. If you are not using Claude Code, ctx separately suggests the model harness most likely to fit your goal. Nobody tells you that.
Entity rot — skills, agents, MCP servers, and harness records you added months ago and never used are cluttering your context. Stale ones should be flagged and archived.

ctx solves all of these by treating your ctx inventory as a knowledge graph with persistent memory, not a flat directory.

What this is¶

ctx is not a collection of scripts. It is an agent with persistent memory and a knowledge graph.

The core idea comes from Andrej Karpathy's LLM-wiki pattern: instead of re-loading everything from scratch each session, an LLM maintains a wiki it can read, write, and query. The wiki becomes the agent's long-term memory.

ctx applies that pattern to entity management — and extends it with graph-based discovery:

A Karpathy 3-layer wiki at ~/.claude/skill-wiki/ is the single source of truth.
79,958 graph nodes for the shipped skill/agent/MCP/harness inventory, including 68,494 skill pages and 207 harness pages under entities/harnesses/. Each page tracks tags, status, provenance, and usage where it applies.
A knowledge graph (79,958 nodes, 1,778,069 edges) built from a 12,934-node core plus 67,024 body-backed skill nodes. The graph has 52 Louvain communities and blends semantic cosine, tag overlap, and slug-token overlap; 67,024 skill bodies are shipped as installable SKILL.md files. Entries over the configured line threshold are converted to gated micro-skill orchestrators. Full source bodies were used for semantic graphing before packaging; SKILL.md.original backups are not shipped in the tarball.
52 Louvain communities group related entities into named communities (e.g., AI + Devops + Frontend, Python + API).
PostToolUse and Stop hooks update the wiki automatically during each Claude Code session.
Hydrated skills over 180 lines are converted to gated micro-skill pipelines so the router can load them incrementally.
At session start, the skill-router scans your project and recommends the best-matching skills, agents, and MCP servers.
Mid-session, the context monitor watches every tool call, detects new stack signals, walks the graph, and recommends relevant skills, agents, and MCP servers in real time — nothing loads or installs without your approval.
Recommendation calls can suppress already selected, rejected, active, or baseline context and can filter local/no-key or language-mismatched rows before they enter a plan.
During custom/API/local model onboarding and LoopFlow/agent-loop adapter calls with explicit user-owned model consent, ctx-init, ctx-harness-install, and python -m ctx.adapters.loopflow use the same graph to recommend harnesses above the configured harness match floor.

The result: you always know what skills, agents, and MCP servers are available for your current task, and which harness fits when you choose your own model. The graph reveals hidden connections. The wiki learns from your usage. Stale ones are flagged. New ones self-ingest.

Explore the docs¶

Knowledge graph

79,958 shipped graph nodes: 12,934 curated skill/agent/MCP/harness nodes plus 67,024 body-backed skill nodes. The graph has 1,778,069 weighted edges and 52 Louvain communities. Ships pre-built in graph/wiki-graph.tar.gz and powers the graph-aware recommendations + the pre-ship ctx-dedup-check gate.

Knowledge graph
Entity onboarding

Step-by-step commands for adding a skill, agent, MCP server, or harness to the wiki and graph. Includes the text-to-cad harness pattern for custom-model users.

Entity onboarding
Dashboard

ctx-monitor serve opens a local HTTP dashboard with live graph, skill grades + four-signal scores, session timelines, one-click load/unload for skills, agents, and MCP servers, selectable recommendations, runtime token history, plus harness wiki and graph browsing. It is served by stdlib http.server and renders repo docs with MkDocs-compatible Markdown extensions.

Dashboard reference
Toolbox

Curated councils of skills and agents that fire at session-start, file-save, pre-commit, and session-end. Blocks git commit on HIGH/CRITICAL findings. Five starter toolboxes ship out of the box.

Toolbox overview · Starter toolboxes · Verdicts & guardrails
Skill router

Scans the active repo, detects the stack from file signatures, walks the stack matrix, loads exactly the skills that apply, and can recommend supporting agents and MCP servers. Loop adapters can call the same recommender before each plan.

Router overview · Stack signatures · Skill-stack matrix
Health & quality

Structural health checks (missing frontmatter, orphan manifest entries, line-count drift) plus the four-signal quality score (telemetry · intake · graph · routing) that grades every skill A/B/C/D/F.

Skill health · Memory anchoring · Lifecycle dashboard
Source snapshot

Current main is v1.0.21 — MIT, CI-matrixed (Ubuntu 3.12 plus Windows/macOS 3.11/3.12), 4,641 test inventory. Adds enterprise OpenTelemetry-ready telemetry and ships console scripts including ctx-init, ctx-monitor (local dashboard with graph + wiki + load/unload for skills, agents, and MCP servers, plus Harness Setup for user-owned LLMs), ctx-incremental-attach, ctx-incremental-shadow, ctx-dedup-check (pre-ship near-duplicate gate), and ctx-tag-backfill (entity hygiene), plus a fast runtime graph artifact and the full ~281 MiB wiki tarball with 79,958 nodes / 1,778,069 edges / 52 Louvain communities.

CHANGELOG · Repository

Principles¶

Single source of truth. The wiki and graph drive Claude Code recommendations, custom-model harness recommendations, dashboard views, and entity update reviews.
Explicit approval. ctx can recommend, review, install, update, unload, or uninstall, but it does not mutate live skills, agents, MCP servers, or harness installs without a command or approval path.
Configurable gates. Recommendation floors, semantic edge thresholds, micro-skill line limits, and harness match floors live in config so teams can tune behavior without forking the code.
Evidence over opinion. Suggestions cite real usage data plus knowledge-graph edges. No black-box prompts.
Token discipline. Every council run honors max_tokens / max_seconds budgets.