This directory contains end-user documentation for skillgym.
skillgym is a benchmarking tool for testing whether coding-agent skills are selected and used correctly during real agent execution.
The tool runs real sessions against supported CLIs, captures session artifacts, normalizes them into a shared report format, and executes user-provided JavaScript assertions against that report.
Workspace behavior is documented in workspaces.md, including shared runs, isolated runs, suite-level workspace exports, template directories, and bootstrap commands.
Supported in V1:
- OpenCode CLI
- Codex CLI
- Claude Code CLI
- Cursor Agent CLI
- Real execution only
- JavaScript assertions only
- One benchmark metric: success or failure
- Session telemetry is preserved for debugging
- Best-effort execution limits should still preserve raw artifacts on failure
- TypeScript implementation
- Node.js-compatible APIs only
Assertion authoring and the built-in grouped assert API are documented in assertions.md, including:
- standard strict-assert usage like
assert.equalandassert.match assert.skills.*assert.commands.*assert.fileReads.*assert.toolCalls.*assert.output.*
test-cases.md: test suite and test case authoringassertions.md: assertion reference and matcher semanticsworkspaces.md: shared and isolated workspace behaviorreporters.md: reporter lifecycle, loading, and standard reporter behaviorsession-report.md: normalized report schemasnapshot.md: token regression snapshots and baseline updatesskill-detection.md: how skill selection is observed