Skip to content

feat(skills): add api-version-upgrade skill for dated-package bumps#2243

Open
jeffredodd wants to merge 3 commits into
mainfrom
skill/api-version-upgrade
Open

feat(skills): add api-version-upgrade skill for dated-package bumps#2243
jeffredodd wants to merge 3 commits into
mainfrom
skill/api-version-upgrade

Conversation

@jeffredodd

Copy link
Copy Markdown
Contributor

Summary

Codifies the @gusto/embedded-api-v-<YYYY-MM-DD> upgrade playbook into a reusable skill at .claude/skills/api-version-upgrade/. The skill runs the full 6-phase workflow autonomously: side-by-side package diff, codemod across import paths + cache-key + bare-date sites, type-level verification, E2E scaffolding, and draft base PR creation with a breaking-change matrix in the description.

Born from the live v2025-11-15 → v2026-02-01 execution (PR #2233). Three iterations of evals against three upgrade scenarios validated and refined it. The most important finding: the skill caught a silent-runtime-failure bug in PR #2233 that the live execution had missed.

What's in the skill

.claude/skills/api-version-upgrade/
├── SKILL.md                              ← 6-phase autonomous playbook
├── evals/
│   └── evals.json                        ← 3 test cases, 18 assertions
├── references/
│   ├── diff-methodology.md               ← how to diff each package layer
│   ├── codemod-commands.md               ← sed commands, cache-key + bare-date sweeps
│   ├── e2e-patterns.md                   ← spec patterns + brittleness lessons
│   ├── pr-template.md                    ← PR description template w/ matrix
│   └── known-pitfalls.md                 ← lessons from prior upgrades
└── scripts/
    ├── install-old-package.sh            ← side-by-side install for diffing
    └── diff-packages.sh                  ← runs the full diff and emits findings

Also adds dictionary entries (codemod, worktrees, networkidle, the http-verb prefixes used by embedded-api operation names) to cspell.json.

The bug it caught in PR #2233

src/contexts/ApiProvider/apiVersionHook.ts:3 had const CURRENT_API_VERSION = '2025-11-15' as a bare date string — not part of any package path. The codemod's package-name sweep didn't match it. Without the skill's reference catching this, our v2026-02-01 upgrade would have shipped as a runtime no-op: types claim v2026-02-01, server answers with v2025-11-15 schemas because the header still says v2025-11-15. Typecheck passes, unit tests pass, e2e passes. Partners hit production with the wrong API contract.

The skill's references/codemod-commands.md § "CRITICAL — the bare-date X-Gusto-API-Version sweep" now flags this as a mandatory step. PR #2233 commit 9a9b79e70 fixed it before merge.

Validation: 3 iterations of evals

Eval Scenario Iter 3 result
1 v0.13.0 → v2025-11-15 (the original migration) 6/6 assertions pass — uses boundary-anchored regex (no bare→dated substring corruption)
2 v2025-11-15 → v2026-02-01 (replay of PR #2233) 6/6 assertions pass — avoids migration_blocker conflation, single base PR, includes bare-date sweep
3 v2026-02-01 → v2026-06-15 (pre-flight) 6/6 assertions pass — identifies off-cycle reimbursement as 🔴 with mandatory E2E, classifies helpers.test.ts refactor as ⚠️ optional

18/18 total assertions pass on iteration 3. Eval outputs preserved under .claude/skills/api-version-upgrade-workspace/iteration-3/ locally (gitignored — they're throwaway artifacts).

Confidence notes

  • High confidence in the skill's analysis capability (3× validated, caught real bugs)
  • Lower confidence in fully autonomous live execution — every eval was --dry-run analysis-only. The autonomous end-to-end path (open real PRs, run real codemods) has not been live-tested yet. Recommend exercising it on the v2026-02-01 → v2026-06-15 upgrade as the first real-world run.

Test plan

🤖 Generated with Claude Code

jeffredodd and others added 3 commits June 24, 2026 09:57
Codifies the @gusto/embedded-api-v-<YYYY-MM-DD> upgrade playbook learned
from the v0.13.0 → v2025-11-15 and v2025-11-15 → v2026-02-01 migrations.
The skill runs the full 6-phase workflow autonomously: side-by-side
package diff, codemod across import paths + cache-key + bare-date sites,
type-level verification, E2E scaffolding, and draft base PR creation
with a breaking-change matrix in the description.

Three references encode the lessons we paid for:

- references/codemod-commands.md flags the bare-date X-Gusto-API-Version
  sweep at apiVersionHook.ts as the highest-risk silent-failure site.
  This was missed in the live v2025-11-15 → v2026-02-01 execution and
  only caught by the skill's iteration-2 self-test.
- references/known-pitfalls.md documents the migration_blocker
  conflation, the off-cycle reimbursement subtlety for v2026-06-15, and
  the boundary-anchored regex required when migrating from the bare
  package.
- references/e2e-patterns.md captures the selector lessons (label-text
  click for switches; assertCompletedOverview anchor for company flows).

Validated across three iterations of evals against three scenarios.
Iteration 3 achieves 100% pass on all 18 assertions; the skill steers
fresh agents through every prior bug including the bare-date
X-Gusto-API-Version sweep that was missed in the live execution.

Adds dictionary entries (codemod, worktrees, networkidle, http-verb
prefixes used by the embedded-api operation names) to cspell.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A class of API breaking changes turns out to be invisible to the SDK
in practice: server-side validation tightening where the type contract
is unchanged but the server now rejects input it used to accept (e.g.,
'X required if Y=true'). The SDK already has a complete error-display
pipeline (normalizeToSDKError → fieldErrors[] → inline UI render) that
handles these cases automatically — same UX as a client-side gate,
just one extra HTTP round-trip. This matches the pattern the SDK
already uses for pay-schedule date constraints, frequency rules, etc.

Adding an E2E for these cases tests the SDK's error pipeline, not the
upgrade. The rule encodes when NOT to write an E2E:

- Server validation tightens on a field whose type contract is
  unchanged → ✅ trust the error pipeline (no E2E)

Carve-outs explicitly named:

- SDK has a custom Zod refinement mirroring the server rule and the
  two disagree → 🔴 fix the refinement
- The validation silently strips/transforms data instead of returning
  422 → 🔴 mandatory E2E (the off-cycle reimbursement case)
- The error-key the server returns doesn't bind to a form field name
  → ⚠️ verify

Updated four files:
- SKILL.md decision-rules and Phase 5 guidance
- references/known-pitfalls.md adds 'Trust the error pipeline' section
  with full reasoning and concrete examples
- references/e2e-patterns.md adds the matching anti-pattern
- evals/evals.json updates Eval 2's self-onboarding-email assertion

Iteration 4 evals confirm the rule sticks: agents correctly classify
self-onboarding email as ✅ (cite the decision-rule language verbatim,
check the exception clause, confirm SDK custom refinement agrees with
server rule) and correctly distinguish the off-cycle reimbursement
case as 🔴 (silently strips data — the named carve-out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI format check flagged inconsistent spacing in the breaking-change
matrix table. Prettier auto-fixed the column alignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeffredodd jeffredodd force-pushed the skill/api-version-upgrade branch from eee1c7d to d5edd69 Compare June 24, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant