test(conformance): pin conformance 0.2.0-alpha.1 and baseline the draft-spec suites#2227
Conversation
|
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
There was a problem hiding this comment.
LGTM — test/CI-only change: pins conformance to 0.2.0-alpha.1, adds the draft server suite step, and baselines the currently-failing draft scenarios; no SDK code is touched and both conformance jobs remain advisory.
Extended reasoning...
Overview
The PR touches only conformance test infrastructure: .github/workflows/conformance.yml (comments plus a new draft-suite step), test/conformance/package.json (exact pin bump 0.1.15 → 0.2.0-alpha.1 and a new test:conformance:server:draft script), pnpm-lock.yaml (lockfile churn confined to the conformance dependency subtree), test/conformance/README.md (a note about the exact-pin policy), and test/conformance/expected-failures.yaml (the new baseline). No package under packages/ is modified.
Verification
The new workflow step invokes pnpm --filter @modelcontextprotocol/test-conformance run test:conformance:server:draft, which exists in test/conformance/package.json and routes through scripts/run-server-conformance.sh, passing --suite draft --expected-failures ./expected-failures.yaml straight to the conformance CLI — consistent with the existing server:all script. The lockfile changes match the manifest pin (conformance 0.2.0-alpha.1 now depending on sdk 1.29.0 / zod 4 / undici 7.25.0); zod 3.25.76 is dropped only because nothing in the conformance subtree references it anymore. Per the repo CI/GitHub Actions guidance, I did not flag any speculative workflow-behavior issues; the referenced scripts and filters all resolve.
Security risks
None. This is dev-dependency and CI-config churn for a private test workspace package; no runtime SDK code, auth, or transport logic is affected. The dependency bumps are within the conformance tooling lockfile subtree.
Level of scrutiny
Low. Both conformance jobs keep continue-on-error: true, so even an inaccurate baseline cannot block CI; the baseline contents themselves are an empirical snapshot that the stale-entry mechanism self-corrects over time. The only judgment call — enabling the draft suite as an advisory burn-down gate — is low-risk and reversible.
Other factors
The bug-hunting system found no issues, and the only timeline activity is the changeset bot (no changeset is needed for a private test package). The PR is part of a stack but is self-contained at the CI/test layer.
87f0810 to
285a0c0
Compare
|
Review follow-up: removed |
| # Both jobs are gating: the expected-failures baseline determines pass/fail. | ||
| # A scenario failing outside the baseline, or a stale baseline entry (scenario | ||
| # now passing but still listed), fails the job. |
There was a problem hiding this comment.
🟡 The PR description's Additional context says the conformance jobs "keep continue-on-error for now" and that flipping conformance to a required gate is a separate decision, but this PR's head commit (285a0c0) removes continue-on-error: true from both jobs and the new header comment states "Both jobs are gating." Please either update the description to reflect that conformance becomes a required gate in this PR, or split the gating flip into its own PR per the stated plan.
Extended reasoning...
What's inconsistent
The PR description's Additional context states: "The conformance CI jobs keep continue-on-error for now; flipping conformance to a required gate is a separate decision once the baseline mechanism has bedded in." The diff, however, ships the opposite: continue-on-error: true is removed from both the client-conformance and server-conformance jobs in .github/workflows/conformance.yml, and the new header comment (lines 8–10) explicitly documents "Both jobs are gating: the expected-failures baseline determines pass/fail."
How this happened
The git history makes the sequence clear: the baseline/pin work landed first (423463b, c69139a, 9093cfa), and the gating flip was added in the head commit 285a0c0 "ci(conformance): make conformance jobs gating" (this matches the latest commit reported by changeset-bot). The description was written for the earlier state of the branch and was never updated after the gating commit was layered on.
Why it matters
The gating-vs-advisory question is exactly the CI-policy decision the description claims is being deferred. A reviewer who reads the description would approve this PR believing conformance stays non-blocking and that making it required will be discussed separately — when in fact merging this PR makes both conformance jobs required gates immediately. That changes the review calculus: a flaky draft-suite scenario or an incomplete expected-failures.yaml baseline now blocks unrelated PRs rather than just producing a yellow check.
Step-by-step trace
- Description, Additional context, bullet 3: "The conformance CI jobs keep
continue-on-errorfor now; flipping conformance to a required gate is a separate decision…" - Diff hunk at
.github/workflows/conformance.ymlline 25–28 (old line 19):- continue-on-error: trueremoved underclient-conformance. - Diff hunk at line 38+ (old line 38):
- continue-on-error: trueremoved underserver-conformance. - New header comment, lines 8–10: "Both jobs are gating: the expected-failures baseline determines pass/fail. A scenario failing outside the baseline, or a stale baseline entry … fails the job."
git logon the workflow file shows the removal came from head commit285a0c0"ci(conformance): make conformance jobs gating", after the description was written.
What's not wrong
The code itself is internally consistent — the workflow comments accurately describe the shipped behavior, and the baseline mechanism (stale-entry enforcement, exit-0-on-baselined-failures) supports running as a gate. This is purely a description-staleness/process issue, not a code defect.
How to fix
Either (a) update the PR description's Additional context to say conformance becomes a required gate in this PR (and note that branch-protection/required-checks settings may need updating to match), or (b) drop commit 285a0c0 from this PR and land the gating flip as the separate stacked PR the description already proposes.
285a0c0 to
ae4bde3
Compare
|
Updated per maintainer review: workflow steps reduced to bare |
b0ce6ce to
2c5acb2
Compare
|
Per review: dropped the workflow header comment, and the draft-suite step now uses a root script alias ( |
5109d18 to
6784af5
Compare
8c7ecc1 to
37150a7
Compare
… suites Bump @modelcontextprotocol/conformance from 0.1.15 to the published 0.2.0-alpha.1 release, which adds the draft-spec scenario suites (SEP-2575, SEP-2322, SEP-2243, SEP-2549, SEP-2468, ...) and new checks on existing scenarios (SEP-837 application_type during DCR). The dependency is pinned to an exact version (no ^ range): the conformance framework keeps evolving alongside the draft spec, so new releases are adopted by deliberate version bumps together with expected-failures.yaml updates in the same change. Add a test:conformance:server:draft script that runs the draft-spec server suite against the expected-failures baseline.
Record every scenario that currently fails against conformance 0.2.0-alpha.1 in expected-failures.yaml so the client (--suite all) and server (active + draft) runs exit 0, while unexpected regressions and stale entries still fail. Adds a server: section and groups entries by SEP: - client: SEP-2575, SEP-2322, SEP-2243, SEP-2106, SEP-2468, SEP-2352 draft scenarios, plus pre-existing auth scenarios that now fail the new SEP-837 application_type check and the SEP-990 enterprise-managed-auth extension. - server: SEP-2575 stateless, SEP-2322 input-required-result, SEP-2549 caching, SEP-2243 header validation, plus three WARNING-only entries (SEP-2164 / SEP-2322 SHOULD-level checks). Each entry must be removed as the corresponding feature lands (the runner fails on stale entries), so the baseline burns down per milestone.
Add a workflow step that runs the test:conformance:server:draft script after the existing active-suite step. The client job already covers the draft client scenarios via --suite all; add comments documenting that and the exact-version conformance pin. Both jobs intentionally stay continue-on-error: true for now; the comment in the workflow recommends removing it once the expected-failures baseline has proven stable so that unexpected failures and stale entries actually block.
The expected-failures baseline is the failure-control mechanism: failures in the baseline pass, unexpected failures and stale baseline entries fail. With that in place, continue-on-error made regressions invisible.
…tep names or comments)
…flow header comment
37150a7 to
e9e6a0f
Compare
|
Un-stacked: this PR has no dependency on #2226 (zero file overlap) — now based directly on main, mergeable independently. |
Pins the conformance suite to
@modelcontextprotocol/conformance@0.2.0-alpha.1, enables the new draft-spec suites, and captures the current failure baseline inexpected-failures.yamlso conformance can run as a burn-down gate while the 2026 draft spec gets implemented.Motivation and Context
conformance 0.2.0-alpha.1 ships scenario suites for the draft (2026) spec — SEP-2575 stateless, SEP-2322 input-required/MRTR, SEP-2243 headers, and updated auth checks. The SDK previously pinned 0.1.15, which predates all of them. This PR:
0.2.0-alpha.1(exact pin; future conformance releases are adopted by deliberate version bumps + baseline updates),draftsuite in the server conformance run alongsideactive,expected-failures.yaml— making the file the authoritative todo list that shrinks as the 2026 implementation lands (tracked in Implement SEP-2575: Make MCP Stateless #2184 / Implement SEP-2567: Sessionless MCP via Explicit State Handles #2185 / Implement SEP-2260: Require Server requests to be associated with a Client request #2186 / Implement SEP-2322: Multi Round-Trip Requests #2187).How Has This Been Tested?
All conformance runs exit 0 against the baseline:
--suite all(40 scenarios): 12 pass / 28 baselined (most baselined entries are auth-SEP checks: SEP-837, SEP-2468)active(30 scenarios): 30 pass / 0 faildraft(19 scenarios): 2 pass / 17 baselined (12 of the 17 are SEP-2322 MRTR scenarios)Stale-entry enforcement is on: a baselined scenario that starts passing fails the run until removed from the baseline.
Breaking Changes
None. Test/CI-only; no SDK code touched.
Types of changes
Checklist
Additional context
continue-on-errorfor now; flipping conformance to a required gate is a separate decision once the baseline mechanism has bedded in.