Skip to content

Lazy migration on host boot — seed app.boxel.realm-servers from legacy realms#5319

Open
lukemelia wants to merge 6 commits into
cs-11655-add-realm-servers-event-typefrom
cs-11659-lazy-migration-on-host-boot-derive-trusted-servers-from
Open

Lazy migration on host boot — seed app.boxel.realm-servers from legacy realms#5319
lukemelia wants to merge 6 commits into
cs-11655-add-realm-servers-event-typefrom
cs-11659-lazy-migration-on-host-boot-derive-trusted-servers-from

Conversation

@lukemelia

@lukemelia lukemelia commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What

Self-healing migration: an account with only the legacy app.boxel.realms list gets an app.boxel.realm-servers entry without a privileged batch job. On host boot, when app.boxel.realm-servers is absent (boot falls back to the legacy app.boxel.realms list), derive the realm-server(s) backing the account's existing realms and persist them to the new key. Subsequent fresh boots take the authoritative trusted-servers assembly path.

How

  • MatrixService.start() — in the legacy-fallback branch, when realms exist, derive and write app.boxel.realm-servers via RealmServerService.getRealmServersForRealms(realms) + the existing setRealmServersInAccountData helper. Wrapped best-effort so a derivation failure never breaks boot.
  • RealmServerService.getRealmServersForRealms(realms) — derives each realm's server from its JWT realmServerURL claim (read from the per-realm session token), falling back to this host's own realm server for realms with no token. It never uses the bare realm-URL origin: a realm's origin can differ from its realm server (e.g. the shared base realm at cardstack.com), and persisting such a foreign origin would make the next boot's assertOwnRealmServer reject the list and log the user out. Results are normalized via normalizeRealmServerURL, so realms backed by one server collapse to a single entry.
  • The legacy app.boxel.realms key is retained (not deleted) for rollback safety.
  • Sticky legacy path: a session that booted from the legacy realm list stays on that path for the lifetime of the MatrixService instance (bootedFromLegacyRealmsList). The migration's write to app.boxel.realm-servers only takes effect on the next fresh session — switching the same instance to the trusted path would re-derive the list from _realm-auth for no benefit and could drop realms the trusted servers don't advertise.
  • The app.boxel.realm-servers AccountData listener short-circuits while bootedFromLegacyRealmsList is set, so neither the migration's own write nor the echo from startClient()'s initial sync re-runs trusted-servers assembly mid-boot.
  • Idempotent: gated on app.boxel.realm-servers being genuinely empty, so a re-boot of the same legacy session (where the key is now present) doesn't re-write it; the migration fires at most once per account, and is a no-op for accounts with no realms.
  • Session-token localStorage parsing is centralized in a shared readSessionTokens() helper.

Acceptance criteria

  • An account with only app.boxel.realms gets app.boxel.realm-servers written on next boot, derived from each realm's JWT realmServerURL claim with an own-server fallback (never the bare realm-URL origin).
  • An account that already has app.boxel.realm-servers is untouched.
  • app.boxel.realms is retained (not deleted).
  • A legacy-booted session stays on the legacy path for its lifetime; the migration write doesn't flip it mid-boot.

Tests

  • Boot migration fires for a legacy account and writes the derived realm-server (own server, not the base-realm origin); legacy app.boxel.realms retained; available realms still assemble.
  • The migration self-write echo — including the app.boxel.realm-servers event re-emitted by startClient()'s initial sync — does not flip the boot to the trusted path.
  • A re-boot of the same migrated session stays on the legacy path (realm-servers not re-derived or duplicated).
  • Already-migrated account: app.boxel.realm-servers left unchanged (migration does not run).

🤖 Generated with Claude Code

When `app.boxel.realm-servers` is absent and boot falls back to the legacy
`app.boxel.realms` list, derive the distinct realm-server URLs backing the
user's existing realms and persist them to `app.boxel.realm-servers` so
future boots take the authoritative trusted-servers assembly path. A
realm's server is the origin of its URL, cross-checked against the JWT
`realmServerURL` claim when a session token is present. The legacy
`app.boxel.realms` key is retained for rollback safety. The migration
fires at most once per account and is a no-op for already-migrated
accounts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Preview deployments

Host Test Results

    1 files  ±0      1 suites  ±0   1h 57m 6s ⏱️ - 3m 7s
3 188 tests +1  3 173 ✅ +1  15 💤 ±0  0 ❌ ±0 
3 207 runs  +1  3 192 ✅ +1  15 💤 ±0  0 ❌ ±0 

Results for commit 3588c5b. ± Comparison against earlier commit f8c0161.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   11m 17s ⏱️ +7s
1 727 tests ±0  1 727 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 820 runs  ±0  1 820 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 3588c5b. ± Comparison against earlier commit f8c0161.

lukemelia and others added 2 commits June 23, 2026 12:56
…sembly

The lazy migration writes app.boxel.realm-servers during boot, and that
write echoes back through the AccountData listener mid-boot — re-running
trusted-servers assembly (extra _realm-auth fetches, a racing
setAvailableRealmIdentifiers) and breaking unrelated host tests. Set a
guard flag around the self-write so the listener ignores it; this session
is already assembled from the equivalent legacy realm list, and the new
key takes effect on the next boot.

Also reset realm-servers account data to empty in the raw-helper test's
beforeEach, since the migration now seeds it during the autostart boot.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After the lazy migration writes app.boxel.realm-servers, a later start() on
the same MatrixService instance (e.g. a test that re-boots to pick up a
newly-added realm) read the freshly-written key and switched to the
trusted-servers path, re-deriving the realm list from _realm-auth and
dropping realms the stub doesn't advertise — breaking spec-preview and
card-copy tests.

Make the legacy path sticky for the instance's lifetime via
bootedFromLegacyRealmsList: once a boot assembles from app.boxel.realms,
subsequent boots stay legacy and the migration write is skipped (gated on
trustedServers being genuinely empty). The migration still persists the new
key, so the next fresh session takes the trusted path. Reset on resetState()
so a logout/login re-evaluates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@lukemelia lukemelia left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] Code review: I found one boot-regression risk in the lazy migration.

let token = sessionTokens[normalizedRealmURL] ?? sessionTokens[realmURL];
let claims = token ? realmClaimsFromRawToken(token) : undefined;
let realmServerURL =
claims?.realmServerURL ?? new URL(normalizedRealmURL).origin;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] This fallback can persist a server URL that the next boot will reject. When a legacy realm has no local session token, we write new URL(realm).origin; if that realm is baseRealm.url (already used in the legacy fallback fixture) or a custom/content-origin realm, the migrated app.boxel.realm-servers list can include https://cardstack.com/ or another non-own origin. On the next fresh boot fetchUserRealmsFromTrustedServers() calls assertOwnRealmServer() before any fallback, rejects the list, and MatrixService.start() logs the user out. I think the migration needs to either filter/skip non-own derived origins or reuse the existing own-server fallback behavior from getRealmServersForRealms() when there is no JWT claim.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code 🤖] Good catch — confirmed and fixed in 586e738.

You're right: deriving a realm's server from new URL(realm).origin could persist a foreign origin (https://cardstack.com/ from the base realm, or any custom-domain realm) into app.boxel.realm-servers. On the next fresh boot the trusted-servers path runs assertOwnRealmServer() first, rejects the multi/non-own list, and start()'s catch calls logout(). It was masked in tests because in the test env the base-realm origin and the own-server origin both collapse to localhost:4201.

Fix: dropped the origin-based deriveRealmServerURLsForRealms and reused getRealmServersForRealms() for the migration, exactly as you suggested — it derives from each realm's JWT realmServerURL claim and falls back to this host's own realm server, skipping token-less realms rather than guessing their origin. So the migration now only ever persists the user's own realm server.

…rigin

Addresses a boot-regression risk: deriving a realm's server from the bare
realm-URL origin can persist a foreign origin into app.boxel.realm-servers
(e.g. the shared base realm at cardstack.com, or a custom-domain realm),
because a realm's content origin can differ from its realm server. On the
next fresh boot the trusted-servers path runs assertOwnRealmServer before
any fallback, rejects the non-own/multi-origin list, and start()'s catch
logs the user out.

Reuse getRealmServersForRealms, which derives the server from each realm's
JWT realmServerURL claim and falls back to this host's own realm server,
never the realm-URL origin (token-less realms are skipped, not guessed).
Drop the origin-based deriveRealmServerURLsForRealms and its unit tests
(they couldn't catch this — in the test env every relevant origin collapses
to localhost:4201, which is why it slipped through).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@lukemelia lukemelia requested review from a team and backspace June 23, 2026 19:49
},
);

// CS-11659: lazy migration on host boot. A user who predates

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make evergreen-comments a default? (Can we?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably by editing AGENTS.md

@lukemelia lukemelia changed the title CS-11659: lazy migration on host boot — seed app.boxel.realm-servers from legacy realms Lazy migration on host boot — seed app.boxel.realm-servers from legacy realms Jun 23, 2026
Drop ticket-ID prefixes and temporal phrasing ("predates",
"transition window", "not-yet-migrated") in the lazy-migration
block and the two related test modules. Same technical content,
stated as current contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a self-healing, boot-time migration for legacy accounts by seeding app.boxel.realm-servers from an account’s existing app.boxel.realms list, so subsequent boots can use the trusted-realm-servers assembly path. The PR also adds integration coverage around the boot behavior and refactors session-token parsing into a shared helper.

Changes:

  • Add a boot-time “lazy migration” in MatrixService.start() to persist derived app.boxel.realm-servers when the new key is absent/empty.
  • Refactor RealmServerService localStorage parsing into readSessionTokens() and reuse it for realm-token lookups.
  • Expand/adjust host integration tests and mock-matrix utilities to cover migration + boot behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/host/app/services/matrix-service.ts Adds lazy migration, plus guard flags intended to keep a legacy-boot session stable.
packages/host/app/services/realm-server.ts Centralizes session-token parsing into readSessionTokens() and reuses it.
packages/host/tests/integration/matrix-service-boot-assembly-test.ts Adds integration tests covering lazy migration and “already migrated” behavior.
packages/host/tests/integration/matrix-service-realm-servers-test.ts Adjusts account-data helper tests to start from a known-empty realm-servers list.
packages/host/tests/helpers/mock-matrix/_utils.ts Adds accessors used by the new migration/retention assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +440 to +442
if (this.migratingRealmServersAccountData) {
break;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code 🤖] Good catch — fixed in 3588c5b.

Confirmed: the migratingRealmServersAccountData flag only spanned the synchronous setRealmServersInAccountData() write. startClient()'s initial sync then re-emits every account-data key (the adjacent comment about the app.boxel.realms re-emission already notes this), and that second app.boxel.realm-servers echo arrived with the flag back to false — so the listener ran trusted-servers assembly mid-boot and overwrote the legacy-assembled realm list, defeating the sticky-legacy guarantee.

Rather than the narrower bootedFromLegacyRealmsList && !postLoginCompleted, I guarded the whole case on bootedFromLegacyRealmsList. It's set before the migration write and persists for the instance lifetime, so it covers both echoes (the synchronous self-write and the startClient sync echo) and matches start()'s instance-lifetime stickiness — a legacy-booted session never switches to the trusted path until the next fresh session, in the listener as well as in start(). The narrow migratingRealmServersAccountData flag is now redundant and removed.

This was invisible in tests because the mock client only re-emitted the legacy app.boxel.realms event on startClient. I made the mock re-emit app.boxel.realm-servers too (mirroring the real initial sync) and added a regression test asserting a legacy-migration boot stays non-authoritative through the echo.

Comment on lines 331 to +335
getRealmServersForRealms(realms: string[]) {
let testRealmOrigin = isTesting()
? new URL(testRealmURL).origin
: undefined;
let sessionTokens: Record<string, string> = {};
let sessionStr =
window.localStorage.getItem(SessionLocalStorageKey) ?? '{}';

try {
sessionTokens = JSON.parse(sessionStr) as Record<string, string>;
} catch {
sessionTokens = {};
}
let sessionTokens = this.readSessionTokens();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code 🤖] Right — the description was stale after 586e738 dropped the origin-based approach. Reconciled the PR description: there is no deriveRealmServerURLsForRealms; the migration uses getRealmServersForRealms(), which derives each realm's server from its JWT realmServerURL claim (with this host's own realm server as the fallback for token-less realms) and never consults the bare realm-URL origin. The "How", acceptance-criteria, and "Tests" sections now match the shipped code (and cover the sticky-legacy guard added in 3588c5b).

The lazy migration writes `app.boxel.realm-servers` during start() and that
write echoes back through the AccountData listener — both synchronously and
again when startClient()'s initial sync re-emits account data. The narrow
`migratingRealmServersAccountData` flag only covered the synchronous self-
write, so the startClient echo ran trusted-servers assembly mid-boot and
overwrote the legacy-assembled realm list, defeating the sticky-legacy-path
guarantee.

Guard the listener on `bootedFromLegacyRealmsList` instead: it is set before
the migration write and persists for the instance's lifetime, so it covers
both echoes and matches start()'s instance-lifetime stickiness. The narrow
flag is now redundant and removed.

The mock matrix client only re-emitted the legacy realms event on startClient,
which hid the bug; re-emit `app.boxel.realm-servers` too, mirroring the real
initial sync, and add a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants