Skip to content

test(cli): screenshot tests asserting each command loads the right skill#682

Open
sarahxsanders wants to merge 6 commits into
mainfrom
posthog-code/cli-screenshot-tests
Open

test(cli): screenshot tests asserting each command loads the right skill#682
sarahxsanders wants to merge 6 commits into
mainfrom
posthog-code/cli-screenshot-tests

Conversation

@sarahxsanders

@sarahxsanders sarahxsanders commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What

Catches command → intro-screen wiring regressions: a command falling through to the default flow, routing to the wrong skill, or rendering nothing. For each command in the surface, the harness (scripts/cli-screenshots.mjs):

  1. runs the real built binary in a fixed-size pseudo-terminal (node-pty),
  2. replays the captured bytes through a headless terminal emulator (@xterm/headless) and snapshots the settled screen — an actual screenshot, colour + layout,
  3. asserts that screenshot contains a marker: the program/skill id the command's own source wires into its intro row (Program ✔ … / Skill ✔ …).

Run it with pnpm screens:cli. Verified locally (macOS) and in a Linux container mirroring CI — 12/12 green.

Decisions & gotchas worth your attention

Why a native dependency (node-pty) instead of the script command

Ink asks the terminal "how many rows/columns do you have?" before it draws — no size, blank screen. node-pty lets us set that size (100×40); script only inherits it from an attached terminal, and there's none when spawned headlessly or in CI, so Ink paints nothing (this is exactly the empty-capture dead end we hit first).

The deeper reason: allocating a pty is an OS syscall (openpty/forkpty) — there's no pure-JS way to do it. Your only options are a native addon that exposes size control (node-pty) or shelling out to a native binary that doesn't (script). So even on Linux, script wouldn't have worked. The native dep buys the one thing Ink requires.

We assert a string match, not a full-screenshot match

Tempting to diff the whole rendered frame against a committed golden — but several screens animate (spinners) or wait on an async fetch, and some intros are project-dependent (run in this repo, revenue-analytics hits a "no Stripe SDK" preflight block instead of its normal intro). Whole-frame matching flakes on all of that.

So we still read the rendered screenshot — we just assert the one line that proves the routing (the source-wired program/skill id), which is stable across spinners, ephemeral ports, and project state. Each marker carries a src field pointing at the wizard source it comes from, so it stays a deliberate assertion, not an eyeballed string. There are no committed goldens — markers live in the harness; rendered frames are gitignored CI artifacts.

Two markers are intentionally weaker, and the code says so:

  • revenue-analytics / upload-source-maps preflight-block in the wizard's own repo, so they don't reach the skill-id intro — their marker proves the command routed to the right flow, not the exact skill id.
  • slack add starts OAuth immediately, so its settled screen is the auth wait — the marker proves it didn't fall through to the default flow.

Ink workarounds

  • Fixed pty size (cols/rows) — see above; without it Ink renders blank.
  • FORCE_COLOR=3 + TERM=xterm-256color pinned so colour output is identical local ↔ CI (otherwise the render differs by environment).
  • Render the settled frame — we wait for output to go quiet (SETTLE_MS), with a hard cap (MAX_CAPTURE_MS) for screens that never settle (live spinners). The emulator then collapses all the intermediate spinner/wait frames into the final screen.

node-pty packaging

  • pnpm's extraction drops the executable bit on node-pty's prebuilt spawn-helper, so pty.spawn dies with posix_spawnp failed on a fresh install. The harness re-adds +x itself (self-heals; no-op once fixed upstream). Needs node-pty in pnpm.onlyBuiltDependencies.
  • node-pty ships macOS prebuilds only — on Linux it compiles from source at install (CI runners have the build tools, confirmed working).

CI (cli-screenshots.yml)

  • pull_request (path-filtered) — pre-merge gate, runs only when files that decide wiring change (bin.ts, src/commands/**, src/ui/tui/**, src/lib/programs/**, the harness).
  • schedule (daily) — drift monitor: this is partly an integration test against context-mill's live skill-menu.json, so an upstream change can break a command with no wizard PR. Slack-notifies on scheduled failure (reuses SLACK_WEBHOOK_WIZARD_CHANNEL); PR failures self-announce via the red ❌.
  • workflow_dispatch — manual runs.

🤖 Generated with Claude Code

Adds scripts/cli-screenshots.mjs (pnpm screens:cli): for each command it runs
the real binary in a pseudo-terminal (via `script`, since Ink needs a TTY),
captures the raw ANSI output, and diffs it against a committed golden dump under
scripts/__screenshots__/. Catches regressions where a command falls through to
the default flow, renders the wrong intro screen, or doesn't render at all —
the class of bug we hit this cycle.

Goldens are raw ANSI ("jank screenshots"); normalize() strips volatile
cursor/spinner sequences before comparing so a live TUI doesn't flake.
Complements the component-level scripts/check-screens.tsx.

Includes a workflow_dispatch CI job. NOT yet wired to gate PRs: the goldens
need seeding once in a real terminal/CI (`pnpm build && pnpm screens:cli
--update`, commit scripts/__screenshots__/), then enable the pull_request/push
triggers. (Couldn't seed here — the sandbox has no pty.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci error-tracking-upload-source-maps
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci error-tracking-upload-source-maps/android
  • /wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
  • /wizard-ci error-tracking-upload-source-maps/flutter
  • /wizard-ci error-tracking-upload-source-maps/ios
  • /wizard-ci error-tracking-upload-source-maps/next
  • /wizard-ci error-tracking-upload-source-maps/next-no-posthog
  • /wizard-ci error-tracking-upload-source-maps/node-raw
  • /wizard-ci error-tracking-upload-source-maps/node-rollup
  • /wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
  • /wizard-ci error-tracking-upload-source-maps/node-webpack
  • /wizard-ci error-tracking-upload-source-maps/nuxt-3-6
  • /wizard-ci error-tracking-upload-source-maps/nuxt-4-3
  • /wizard-ci error-tracking-upload-source-maps/react-native
  • /wizard-ci error-tracking-upload-source-maps/react-vite
  • /wizard-ci error-tracking-upload-source-maps/rust
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

sarahxsanders and others added 3 commits June 17, 2026 17:54
SIGTERM to `script` left the inner wizard running (script doesn't forward
signals), and a stray wizard holds resources like the OAuth-callback port — so
the first capture worked and every one after it came back empty. Spawn detached
and SIGINT the process group (Ink then restores the terminal; the inner node
dies), with a SIGKILL fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…byte-exact)

Replaces the `script`-based capture, which painted blank frames because the pty
had no window size when detached. node-pty gives a real pty with a fixed size
(COLS×ROWS), forced colour (stable local↔CI), raw-byte capture (encoding: null),
and a clean kill. Capture waits for the screen to settle, then snapshots; goldens
are raw binary ANSI dumps compared byte-for-byte.

Adds node-pty as a devDependency (not bundled into the shipped wizard) and to
pnpm.onlyBuiltDependencies so its native build runs on install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eenshot

Run each command in a sized pty (node-pty), render the settled screen
through a headless emulator (@xterm/headless), and assert it contains a
source-grounded marker — the program/skill id the command wires into its
intro screen. Catches command -> screen wiring regressions.

- marker match, not byte-for-byte golden: survives spinners, async fetches,
  and project-dependent intros that make whole-frame matching flake
- self-heal node-pty's spawn-helper +x bit (pnpm drops it on install)
- CI: path-filtered PR gate + daily drift monitor, Slack on scheduled fail

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sarahxsanders sarahxsanders changed the title test(cli): ANSI screenshot tests for command → intro-screen wiring test(cli): screenshot tests asserting each command loads the right skill Jun 18, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sarahxsanders sarahxsanders requested review from a team and removed request for a team June 18, 2026 14:35
The integration (default) and slack flows render different screens
depending on the directory, detection speed, and login state — cold CI
catches `default` still detecting, and logged-out CI shows slack's connect
intro rather than the login wait. A single static marker can't match all of
them. Allow `marker` to be an array (pass if any appears); every accepted
marker is still flow-specific, so a fall-through to the wrong flow fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sarahxsanders sarahxsanders marked this pull request as ready for review June 18, 2026 14:43
@sarahxsanders sarahxsanders requested a review from a team June 18, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants