Lazy migration on host boot — seed app.boxel.realm-servers from legacy realms#5319
Conversation
When `app.boxel.realm-servers` is absent and boot falls back to the legacy `app.boxel.realms` list, derive the distinct realm-server URLs backing the user's existing realms and persist them to `app.boxel.realm-servers` so future boots take the authoritative trusted-servers assembly path. A realm's server is the origin of its URL, cross-checked against the JWT `realmServerURL` claim when a session token is present. The legacy `app.boxel.realms` key is retained for rollback safety. The migration fires at most once per account and is a no-op for already-migrated accounts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 57m 6s ⏱️ - 3m 7s Results for commit 3588c5b. ± Comparison against earlier commit f8c0161. Realm Server Test Results 1 files ±0 1 suites ±0 11m 17s ⏱️ +7s Results for commit 3588c5b. ± Comparison against earlier commit f8c0161. |
…sembly The lazy migration writes app.boxel.realm-servers during boot, and that write echoes back through the AccountData listener mid-boot — re-running trusted-servers assembly (extra _realm-auth fetches, a racing setAvailableRealmIdentifiers) and breaking unrelated host tests. Set a guard flag around the self-write so the listener ignores it; this session is already assembled from the equivalent legacy realm list, and the new key takes effect on the next boot. Also reset realm-servers account data to empty in the raw-helper test's beforeEach, since the migration now seeds it during the autostart boot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After the lazy migration writes app.boxel.realm-servers, a later start() on the same MatrixService instance (e.g. a test that re-boots to pick up a newly-added realm) read the freshly-written key and switched to the trusted-servers path, re-deriving the realm list from _realm-auth and dropping realms the stub doesn't advertise — breaking spec-preview and card-copy tests. Make the legacy path sticky for the instance's lifetime via bootedFromLegacyRealmsList: once a boot assembles from app.boxel.realms, subsequent boots stay legacy and the migration write is skipped (gated on trustedServers being genuinely empty). The migration still persists the new key, so the next fresh session takes the trusted path. Reset on resetState() so a logout/login re-evaluates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
lukemelia
left a comment
There was a problem hiding this comment.
[Codex] Code review: I found one boot-regression risk in the lazy migration.
| let token = sessionTokens[normalizedRealmURL] ?? sessionTokens[realmURL]; | ||
| let claims = token ? realmClaimsFromRawToken(token) : undefined; | ||
| let realmServerURL = | ||
| claims?.realmServerURL ?? new URL(normalizedRealmURL).origin; |
There was a problem hiding this comment.
[Codex] This fallback can persist a server URL that the next boot will reject. When a legacy realm has no local session token, we write new URL(realm).origin; if that realm is baseRealm.url (already used in the legacy fallback fixture) or a custom/content-origin realm, the migrated app.boxel.realm-servers list can include https://cardstack.com/ or another non-own origin. On the next fresh boot fetchUserRealmsFromTrustedServers() calls assertOwnRealmServer() before any fallback, rejects the list, and MatrixService.start() logs the user out. I think the migration needs to either filter/skip non-own derived origins or reuse the existing own-server fallback behavior from getRealmServersForRealms() when there is no JWT claim.
There was a problem hiding this comment.
[Claude Code 🤖] Good catch — confirmed and fixed in 586e738.
You're right: deriving a realm's server from new URL(realm).origin could persist a foreign origin (https://cardstack.com/ from the base realm, or any custom-domain realm) into app.boxel.realm-servers. On the next fresh boot the trusted-servers path runs assertOwnRealmServer() first, rejects the multi/non-own list, and start()'s catch calls logout(). It was masked in tests because in the test env the base-realm origin and the own-server origin both collapse to localhost:4201.
Fix: dropped the origin-based deriveRealmServerURLsForRealms and reused getRealmServersForRealms() for the migration, exactly as you suggested — it derives from each realm's JWT realmServerURL claim and falls back to this host's own realm server, skipping token-less realms rather than guessing their origin. So the migration now only ever persists the user's own realm server.
…rigin Addresses a boot-regression risk: deriving a realm's server from the bare realm-URL origin can persist a foreign origin into app.boxel.realm-servers (e.g. the shared base realm at cardstack.com, or a custom-domain realm), because a realm's content origin can differ from its realm server. On the next fresh boot the trusted-servers path runs assertOwnRealmServer before any fallback, rejects the non-own/multi-origin list, and start()'s catch logs the user out. Reuse getRealmServersForRealms, which derives the server from each realm's JWT realmServerURL claim and falls back to this host's own realm server, never the realm-URL origin (token-less realms are skipped, not guessed). Drop the origin-based deriveRealmServerURLsForRealms and its unit tests (they couldn't catch this — in the test env every relevant origin collapses to localhost:4201, which is why it slipped through). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| }, | ||
| ); | ||
|
|
||
| // CS-11659: lazy migration on host boot. A user who predates |
There was a problem hiding this comment.
Should we make evergreen-comments a default? (Can we?)
There was a problem hiding this comment.
Probably by editing AGENTS.md
Drop ticket-ID prefixes and temporal phrasing ("predates",
"transition window", "not-yet-migrated") in the lazy-migration
block and the two related test modules. Same technical content,
stated as current contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements a self-healing, boot-time migration for legacy accounts by seeding app.boxel.realm-servers from an account’s existing app.boxel.realms list, so subsequent boots can use the trusted-realm-servers assembly path. The PR also adds integration coverage around the boot behavior and refactors session-token parsing into a shared helper.
Changes:
- Add a boot-time “lazy migration” in
MatrixService.start()to persist derivedapp.boxel.realm-serverswhen the new key is absent/empty. - Refactor
RealmServerServicelocalStorage parsing intoreadSessionTokens()and reuse it for realm-token lookups. - Expand/adjust host integration tests and mock-matrix utilities to cover migration + boot behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/host/app/services/matrix-service.ts | Adds lazy migration, plus guard flags intended to keep a legacy-boot session stable. |
| packages/host/app/services/realm-server.ts | Centralizes session-token parsing into readSessionTokens() and reuses it. |
| packages/host/tests/integration/matrix-service-boot-assembly-test.ts | Adds integration tests covering lazy migration and “already migrated” behavior. |
| packages/host/tests/integration/matrix-service-realm-servers-test.ts | Adjusts account-data helper tests to start from a known-empty realm-servers list. |
| packages/host/tests/helpers/mock-matrix/_utils.ts | Adds accessors used by the new migration/retention assertions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (this.migratingRealmServersAccountData) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
[Claude Code 🤖] Good catch — fixed in 3588c5b.
Confirmed: the migratingRealmServersAccountData flag only spanned the synchronous setRealmServersInAccountData() write. startClient()'s initial sync then re-emits every account-data key (the adjacent comment about the app.boxel.realms re-emission already notes this), and that second app.boxel.realm-servers echo arrived with the flag back to false — so the listener ran trusted-servers assembly mid-boot and overwrote the legacy-assembled realm list, defeating the sticky-legacy guarantee.
Rather than the narrower bootedFromLegacyRealmsList && !postLoginCompleted, I guarded the whole case on bootedFromLegacyRealmsList. It's set before the migration write and persists for the instance lifetime, so it covers both echoes (the synchronous self-write and the startClient sync echo) and matches start()'s instance-lifetime stickiness — a legacy-booted session never switches to the trusted path until the next fresh session, in the listener as well as in start(). The narrow migratingRealmServersAccountData flag is now redundant and removed.
This was invisible in tests because the mock client only re-emitted the legacy app.boxel.realms event on startClient. I made the mock re-emit app.boxel.realm-servers too (mirroring the real initial sync) and added a regression test asserting a legacy-migration boot stays non-authoritative through the echo.
| getRealmServersForRealms(realms: string[]) { | ||
| let testRealmOrigin = isTesting() | ||
| ? new URL(testRealmURL).origin | ||
| : undefined; | ||
| let sessionTokens: Record<string, string> = {}; | ||
| let sessionStr = | ||
| window.localStorage.getItem(SessionLocalStorageKey) ?? '{}'; | ||
|
|
||
| try { | ||
| sessionTokens = JSON.parse(sessionStr) as Record<string, string>; | ||
| } catch { | ||
| sessionTokens = {}; | ||
| } | ||
| let sessionTokens = this.readSessionTokens(); |
There was a problem hiding this comment.
[Claude Code 🤖] Right — the description was stale after 586e738 dropped the origin-based approach. Reconciled the PR description: there is no deriveRealmServerURLsForRealms; the migration uses getRealmServersForRealms(), which derives each realm's server from its JWT realmServerURL claim (with this host's own realm server as the fallback for token-less realms) and never consults the bare realm-URL origin. The "How", acceptance-criteria, and "Tests" sections now match the shipped code (and cover the sticky-legacy guard added in 3588c5b).
The lazy migration writes `app.boxel.realm-servers` during start() and that write echoes back through the AccountData listener — both synchronously and again when startClient()'s initial sync re-emits account data. The narrow `migratingRealmServersAccountData` flag only covered the synchronous self- write, so the startClient echo ran trusted-servers assembly mid-boot and overwrote the legacy-assembled realm list, defeating the sticky-legacy-path guarantee. Guard the listener on `bootedFromLegacyRealmsList` instead: it is set before the migration write and persists for the instance's lifetime, so it covers both echoes and matches start()'s instance-lifetime stickiness. The narrow flag is now redundant and removed. The mock matrix client only re-emitted the legacy realms event on startClient, which hid the bug; re-emit `app.boxel.realm-servers` too, mirroring the real initial sync, and add a regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Self-healing migration: an account with only the legacy
app.boxel.realmslist gets anapp.boxel.realm-serversentry without a privileged batch job. On host boot, whenapp.boxel.realm-serversis absent (boot falls back to the legacyapp.boxel.realmslist), derive the realm-server(s) backing the account's existing realms and persist them to the new key. Subsequent fresh boots take the authoritative trusted-servers assembly path.How
MatrixService.start()— in the legacy-fallback branch, when realms exist, derive and writeapp.boxel.realm-serversviaRealmServerService.getRealmServersForRealms(realms)+ the existingsetRealmServersInAccountDatahelper. Wrapped best-effort so a derivation failure never breaks boot.RealmServerService.getRealmServersForRealms(realms)— derives each realm's server from its JWTrealmServerURLclaim (read from the per-realm session token), falling back to this host's own realm server for realms with no token. It never uses the bare realm-URL origin: a realm's origin can differ from its realm server (e.g. the shared base realm atcardstack.com), and persisting such a foreign origin would make the next boot'sassertOwnRealmServerreject the list and log the user out. Results are normalized vianormalizeRealmServerURL, so realms backed by one server collapse to a single entry.app.boxel.realmskey is retained (not deleted) for rollback safety.MatrixServiceinstance (bootedFromLegacyRealmsList). The migration's write toapp.boxel.realm-serversonly takes effect on the next fresh session — switching the same instance to the trusted path would re-derive the list from_realm-authfor no benefit and could drop realms the trusted servers don't advertise.app.boxel.realm-serversAccountData listener short-circuits whilebootedFromLegacyRealmsListis set, so neither the migration's own write nor the echo fromstartClient()'s initial sync re-runs trusted-servers assembly mid-boot.app.boxel.realm-serversbeing genuinely empty, so a re-boot of the same legacy session (where the key is now present) doesn't re-write it; the migration fires at most once per account, and is a no-op for accounts with no realms.readSessionTokens()helper.Acceptance criteria
app.boxel.realmsgetsapp.boxel.realm-serverswritten on next boot, derived from each realm's JWTrealmServerURLclaim with an own-server fallback (never the bare realm-URL origin).app.boxel.realm-serversis untouched.app.boxel.realmsis retained (not deleted).Tests
app.boxel.realmsretained; available realms still assemble.app.boxel.realm-serversevent re-emitted bystartClient()'s initial sync — does not flip the boot to the trusted path.app.boxel.realm-serversleft unchanged (migration does not run).🤖 Generated with Claude Code