Skip to content

test(conformance): pin conformance 0.2.0-alpha.1 and baseline the draft-spec suites#2227

Merged
felixweinberger merged 8 commits into
mainfrom
fweinberger/m0-conformance-draft-baseline
Jun 1, 2026
Merged

test(conformance): pin conformance 0.2.0-alpha.1 and baseline the draft-spec suites#2227
felixweinberger merged 8 commits into
mainfrom
fweinberger/m0-conformance-draft-baseline

Conversation

@felixweinberger
Copy link
Copy Markdown
Contributor

Stack 2/3 — based on #2226 (manifest annotations). Review/merge bottom-up.

Pins the conformance suite to @modelcontextprotocol/conformance@0.2.0-alpha.1, enables the new draft-spec suites, and captures the current failure baseline in expected-failures.yaml so conformance can run as a burn-down gate while the 2026 draft spec gets implemented.

Motivation and Context

conformance 0.2.0-alpha.1 ships scenario suites for the draft (2026) spec — SEP-2575 stateless, SEP-2322 input-required/MRTR, SEP-2243 headers, and updated auth checks. The SDK previously pinned 0.1.15, which predates all of them. This PR:

  1. bumps the pin to the published 0.2.0-alpha.1 (exact pin; future conformance releases are adopted by deliberate version bumps + baseline updates),
  2. enables the draft suite in the server conformance run alongside active,
  3. records every currently-failing draft scenario in expected-failures.yaml — making the file the authoritative todo list that shrinks as the 2026 implementation lands (tracked in Implement SEP-2575: Make MCP Stateless #2184 / Implement SEP-2567: Sessionless MCP via Explicit State Handles #2185 / Implement SEP-2260: Require Server requests to be associated with a Client request #2186 / Implement SEP-2322: Multi Round-Trip Requests #2187).

How Has This Been Tested?

All conformance runs exit 0 against the baseline:

  • client --suite all (40 scenarios): 12 pass / 28 baselined (most baselined entries are auth-SEP checks: SEP-837, SEP-2468)
  • server active (30 scenarios): 30 pass / 0 fail
  • server draft (19 scenarios): 2 pass / 17 baselined (12 of the 17 are SEP-2322 MRTR scenarios)

Stale-entry enforcement is on: a baselined scenario that starts passing fails the run until removed from the baseline.

Breaking Changes

None. Test/CI-only; no SDK code touched.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

  • Most server draft scenarios currently fail early with "Session ID required" because the conformance fixture server is stateful-only — they will start reaching their actual SEP assertions once the stateless request path lands (M5/M6 of the implementation plan).
  • Several client draft failures are missing fixture handlers ("NotObserved") rather than SDK behavior gaps; they get addressed as the relevant features land.
  • The conformance CI jobs keep continue-on-error for now; flipping conformance to a required gate is a separate decision once the baseline mechanism has bedded in.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 1, 2026

⚠️ No Changeset found

Latest commit: 3d35f42

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 1, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/client@2227

@modelcontextprotocol/codemod

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/codemod@2227

@modelcontextprotocol/server

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server@2227

@modelcontextprotocol/server-legacy

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server-legacy@2227

@modelcontextprotocol/express

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/express@2227

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/fastify@2227

@modelcontextprotocol/hono

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/hono@2227

@modelcontextprotocol/node

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/node@2227

commit: 3d35f42

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — test/CI-only change: pins conformance to 0.2.0-alpha.1, adds the draft server suite step, and baselines the currently-failing draft scenarios; no SDK code is touched and both conformance jobs remain advisory.

Extended reasoning...

Overview

The PR touches only conformance test infrastructure: .github/workflows/conformance.yml (comments plus a new draft-suite step), test/conformance/package.json (exact pin bump 0.1.15 → 0.2.0-alpha.1 and a new test:conformance:server:draft script), pnpm-lock.yaml (lockfile churn confined to the conformance dependency subtree), test/conformance/README.md (a note about the exact-pin policy), and test/conformance/expected-failures.yaml (the new baseline). No package under packages/ is modified.

Verification

The new workflow step invokes pnpm --filter @modelcontextprotocol/test-conformance run test:conformance:server:draft, which exists in test/conformance/package.json and routes through scripts/run-server-conformance.sh, passing --suite draft --expected-failures ./expected-failures.yaml straight to the conformance CLI — consistent with the existing server:all script. The lockfile changes match the manifest pin (conformance 0.2.0-alpha.1 now depending on sdk 1.29.0 / zod 4 / undici 7.25.0); zod 3.25.76 is dropped only because nothing in the conformance subtree references it anymore. Per the repo CI/GitHub Actions guidance, I did not flag any speculative workflow-behavior issues; the referenced scripts and filters all resolve.

Security risks

None. This is dev-dependency and CI-config churn for a private test workspace package; no runtime SDK code, auth, or transport logic is affected. The dependency bumps are within the conformance tooling lockfile subtree.

Level of scrutiny

Low. Both conformance jobs keep continue-on-error: true, so even an inaccurate baseline cannot block CI; the baseline contents themselves are an empirical snapshot that the stale-entry mechanism self-corrects over time. The only judgment call — enabling the draft suite as an advisory burn-down gate — is low-risk and reversible.

Other factors

The bug-hunting system found no issues, and the only timeline activity is the changeset bot (no changeset is needed for a private test package). The PR is part of a stack but is self-contained at the CI/test layer.

@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 87f0810 to 285a0c0 Compare June 1, 2026 12:41
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Review follow-up: removed continue-on-error from both conformance jobs. The expected-failures baseline is the failure-control mechanism (baselined failures pass; unexpected failures and stale baseline entries fail) — keeping the jobs advisory on top of that would have made regressions invisible.

Comment thread .github/workflows/conformance.yml Outdated
Comment on lines +8 to +10
# Both jobs are gating: the expected-failures baseline determines pass/fail.
# A scenario failing outside the baseline, or a stale baseline entry (scenario
# now passing but still listed), fails the job.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The PR description's Additional context says the conformance jobs "keep continue-on-error for now" and that flipping conformance to a required gate is a separate decision, but this PR's head commit (285a0c0) removes continue-on-error: true from both jobs and the new header comment states "Both jobs are gating." Please either update the description to reflect that conformance becomes a required gate in this PR, or split the gating flip into its own PR per the stated plan.

Extended reasoning...

What's inconsistent

The PR description's Additional context states: "The conformance CI jobs keep continue-on-error for now; flipping conformance to a required gate is a separate decision once the baseline mechanism has bedded in." The diff, however, ships the opposite: continue-on-error: true is removed from both the client-conformance and server-conformance jobs in .github/workflows/conformance.yml, and the new header comment (lines 8–10) explicitly documents "Both jobs are gating: the expected-failures baseline determines pass/fail."

How this happened

The git history makes the sequence clear: the baseline/pin work landed first (423463b, c69139a, 9093cfa), and the gating flip was added in the head commit 285a0c0 "ci(conformance): make conformance jobs gating" (this matches the latest commit reported by changeset-bot). The description was written for the earlier state of the branch and was never updated after the gating commit was layered on.

Why it matters

The gating-vs-advisory question is exactly the CI-policy decision the description claims is being deferred. A reviewer who reads the description would approve this PR believing conformance stays non-blocking and that making it required will be discussed separately — when in fact merging this PR makes both conformance jobs required gates immediately. That changes the review calculus: a flaky draft-suite scenario or an incomplete expected-failures.yaml baseline now blocks unrelated PRs rather than just producing a yellow check.

Step-by-step trace

  1. Description, Additional context, bullet 3: "The conformance CI jobs keep continue-on-error for now; flipping conformance to a required gate is a separate decision…"
  2. Diff hunk at .github/workflows/conformance.yml line 25–28 (old line 19): - continue-on-error: true removed under client-conformance.
  3. Diff hunk at line 38+ (old line 38): - continue-on-error: true removed under server-conformance.
  4. New header comment, lines 8–10: "Both jobs are gating: the expected-failures baseline determines pass/fail. A scenario failing outside the baseline, or a stale baseline entry … fails the job."
  5. git log on the workflow file shows the removal came from head commit 285a0c0 "ci(conformance): make conformance jobs gating", after the description was written.

What's not wrong

The code itself is internally consistent — the workflow comments accurately describe the shipped behavior, and the baseline mechanism (stale-entry enforcement, exit-0-on-baselined-failures) supports running as a gate. This is purely a description-staleness/process issue, not a code defect.

How to fix

Either (a) update the PR description's Additional context to say conformance becomes a required gate in this PR (and note that branch-protection/required-checks settings may need updating to match), or (b) drop commit 285a0c0 from this PR and land the gating flip as the separate stacked PR the description already proposes.

@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 285a0c0 to ae4bde3 Compare June 1, 2026 13:37
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Updated per maintainer review: workflow steps reduced to bare run commands (no step names or comments beyond what was asked for). The gating change (no continue-on-error) stands.

@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch 2 times, most recently from b0ce6ce to 2c5acb2 Compare June 1, 2026 13:59
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Per review: dropped the workflow header comment, and the draft-suite step now uses a root script alias (pnpm run test:conformance:server:draft) matching the existing conformance script pattern — the --filter form was an inconsistency (the root alias was missing), not a necessity.

@felixweinberger felixweinberger force-pushed the fweinberger/m0-manifest-annotations branch from 5109d18 to 6784af5 Compare June 1, 2026 14:19
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 8c7ecc1 to 37150a7 Compare June 1, 2026 14:23
… suites

Bump @modelcontextprotocol/conformance from 0.1.15 to the published
0.2.0-alpha.1 release, which adds the draft-spec scenario suites (SEP-2575,
SEP-2322, SEP-2243, SEP-2549, SEP-2468, ...) and new checks on existing
scenarios (SEP-837 application_type during DCR).

The dependency is pinned to an exact version (no ^ range): the conformance
framework keeps evolving alongside the draft spec, so new releases are
adopted by deliberate version bumps together with expected-failures.yaml
updates in the same change.

Add a test:conformance:server:draft script that runs the draft-spec server
suite against the expected-failures baseline.
Record every scenario that currently fails against conformance 0.2.0-alpha.1
in expected-failures.yaml so the client (--suite all) and server (active +
draft) runs exit 0, while unexpected regressions and stale entries still fail.

Adds a server: section and groups entries by SEP:
- client: SEP-2575, SEP-2322, SEP-2243, SEP-2106, SEP-2468, SEP-2352 draft
  scenarios, plus pre-existing auth scenarios that now fail the new SEP-837
  application_type check and the SEP-990 enterprise-managed-auth extension.
- server: SEP-2575 stateless, SEP-2322 input-required-result, SEP-2549
  caching, SEP-2243 header validation, plus three WARNING-only entries
  (SEP-2164 / SEP-2322 SHOULD-level checks).

Each entry must be removed as the corresponding feature lands (the runner
fails on stale entries), so the baseline burns down per milestone.
Add a workflow step that runs the test:conformance:server:draft script after
the existing active-suite step. The client job already covers the draft
client scenarios via --suite all; add comments documenting that and the
exact-version conformance pin.

Both jobs intentionally stay continue-on-error: true for now; the comment in
the workflow recommends removing it once the expected-failures baseline has
proven stable so that unexpected failures and stale entries actually block.
The expected-failures baseline is the failure-control mechanism: failures in
the baseline pass, unexpected failures and stale baseline entries fail. With
that in place, continue-on-error made regressions invisible.
@felixweinberger felixweinberger force-pushed the fweinberger/m0-conformance-draft-baseline branch from 37150a7 to e9e6a0f Compare June 1, 2026 14:35
@felixweinberger felixweinberger changed the base branch from fweinberger/m0-manifest-annotations to main June 1, 2026 14:36
@felixweinberger
Copy link
Copy Markdown
Contributor Author

Un-stacked: this PR has no dependency on #2226 (zero file overlap) — now based directly on main, mergeable independently.

@felixweinberger felixweinberger enabled auto-merge (squash) June 1, 2026 14:42
Comment thread test/conformance/README.md
@felixweinberger felixweinberger merged commit 71dcc70 into main Jun 1, 2026
22 checks passed
@felixweinberger felixweinberger deleted the fweinberger/m0-conformance-draft-baseline branch June 1, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants