Skip to content

fix(ee): eliminate boot-time race that latched deployment as community#43

Merged
ABB65 merged 1 commit into
mainfrom
fix/ee-bridge-init-race
May 12, 2026
Merged

fix(ee): eliminate boot-time race that latched deployment as community#43
ABB65 merged 1 commit into
mainfrom
fix/ee-bridge-init-race

Conversation

@ABB65
Copy link
Copy Markdown
Member

@ABB65 ABB65 commented May 12, 2026

Summary

Staging deploy logs (NUXT_DEPLOYMENT_PROFILE=managed, ee/ bundled, native deps healthy) showed:

[deployment] NUXT_DEPLOYMENT_PROFILE="managed" requires the Enterprise Edition (ee/) but the enterprise bridge did not load; falling back to 'community'.
[deployment] (same line, again)
[ee] Enterprise bridge loaded successfully.

PR #39's diagnostics confirmed the bridge itself loads fine — the bug was a microtask vs macro-task ordering mistake in the boot sequence. server/plugins/00.billing-flag.ts ran sync, called resolveDeployment() before the bridge was awaited, then queued a queueMicrotask "second pass" that the plugin's own comment claimed would run after the bridge loaded. But ES dynamic import resolution is a macro-task, so the microtask drained first — twice locking _cached to 'community' for the rest of the process lifetime.

Why this mattered

Wider than a stray warning. With _cached poisoned:

  • getEdition() returns 'agpl' for the rest of the process
  • hasFeature() denies every requires_ee feature server-side
  • Billing middleware (03.billing.ts) runs the fixed-plan branch (community) — subscription billing checks never run
  • runEnterpriseRoute plan gate denies AI keys / webhooks / conversation API with 403
  • Client UI was correct only because the operator had set NUXT_PUBLIC_DEPLOYMENT_* env vars; without those it would have been broken too

Fix — defense in depth across three files

  1. server/utils/enterprise.ts — new isEnterpriseBridgeSettled() exposes whether the dynamic import has resolved (either to a bridge or definitively to null for CE). Pure read of existing global state.

  2. server/utils/deployment.tsresolveDeployment() gates BOTH _cached and the misconfig warning on isEnterpriseBridgeSettled(). Pre-settle calls return a transient community shape without latching; post-settle calls get the correct ee/managed shape and cache it. Warning only fires when ee/ is genuinely absent at settle time.

  3. server/plugins/00.billing-flag.ts — single async pass: await initEnterpriseBridge()applyDeploymentSnapshot(). queueMicrotask two-pass trick is gone. 01.init-ee.ts keeps awaiting the bridge — the memoized promise makes the double-await a no-op and removes the "filename-ordering is load-bearing" coupling.

Why this approach (vs alternatives)

Considered (and rejected):

  • Async plugin only: works today but encodes correctness in filename ordering — fragile to future plugin additions
  • Plugin rename: same fragility, just different filenames
  • Async resolveDeployment(): cascades async through hasFeature/getPlanLimit/getEdition and every server route that calls them — huge blast radius for a problem that's really just "cache locked too early"

The chosen hybrid (cache settlement guard + single-pass async plugin) fixes the race at the cache layer (correct for any caller, not just the plugin path) AND writes a deterministic SSR snapshot post-settle. No public API change.

Test plan

  • pnpm test tests/unit/deployment.test.ts — 14/14 pass (12 existing + 2 new regression cases)
  • pnpm test full suite — 568/568 pass (was 566, +2 new)
  • pnpm lint — 0 errors (7 pre-existing warnings, none in touched files)
  • pnpm typecheck — clean
  • Staging deploy after merge: confirm both [deployment] ...did not load warnings are gone from boot logs, only [ee] Enterprise bridge loaded successfully remains
  • Staging post-merge smoke: open workspace settings → AI Keys / Webhooks tabs render without 403 (these were failing under _cached='community')
  • Verify billing middleware now resolves the correct workspace plan (not the fixed community fallback)

Out-of-scope (worth follow-up issues)

  • Two 00.-prefix plugins (00.validate-config.ts, 00.billing-flag.ts) — ordering between them is filesystem-dependent
  • useDeployment().isCommunity failsafe collapses edition === '' (snapshot not yet hydrated) into "community" — worth distinguishing "not yet known" from "definitively community" so UI can render a brief skeleton instead of mis-rendering
  • Per-call _billingConfigured cache is global, not edition-aware
  • runEnterpriseRoute plan-gate silently skips when event.context.billing is absent — documented but worth a defensive default-deny path

Staging logs (NUXT_DEPLOYMENT_PROFILE=managed, ee/ bundled, native deps
healthy) showed two identical warnings followed by a successful bridge
load:

  [deployment] NUXT_DEPLOYMENT_PROFILE="managed" requires the Enterprise
    Edition (ee/) but the enterprise bridge did not load; falling back
    to 'community'.
  [deployment] (same line, again)
  [ee] Enterprise bridge loaded successfully.

Root cause: a microtask vs macro-task ordering mistake in the boot
sequence. `server/plugins/00.billing-flag.ts` was a sync plugin that
called `resolveDeployment()` before the bridge was awaited, then
scheduled a second pass via `queueMicrotask`. The plugin's comment
claimed the microtask would run "after the ee init plugin has loaded
the bridge", but ES dynamic import resolution is a macro-task — the
microtask drained first, twice locking `_cached` to 'community' before
the bridge finished resolving.

Blast radius was wider than a stray warning: with `_cached` poisoned,
`getEdition()` returned 'agpl' for the rest of the process, so
`hasFeature()` denied every `requires_ee` feature server-side, the
billing middleware (`03.billing.ts`) ran the `fixed`-plan branch
(community), and `runEnterpriseRoute` rejected AI keys / webhooks /
conversation API with 403. Client UI was only correct because operators
had set `NUXT_PUBLIC_DEPLOYMENT_*` env vars; with those unset, the
client snapshot would have been broken too.

Fix is defense-in-depth across three files:

1. `server/utils/enterprise.ts` — new `isEnterpriseBridgeSettled()`
   that exposes whether the dynamic import has resolved (either to a
   bridge or definitively to null for CE). Pure read of existing global
   state; no semantic change to load/get/init helpers.

2. `server/utils/deployment.ts` — `resolveDeployment()` now gates BOTH
   the `_cached` assignment AND the misconfiguration warning on
   `isEnterpriseBridgeSettled()`. Pre-settle calls return a transient
   community shape without latching, so a later post-settle call gets
   the correct ee/managed result and caches it. The warning only fires
   when ee/ is genuinely absent at settle time, not as a spurious
   side-effect of a pre-settle reading.

3. `server/plugins/00.billing-flag.ts` — single async pass:
   `await initEnterpriseBridge()` then `applyDeploymentSnapshot()`. The
   `queueMicrotask` two-pass trick and `__resetDeploymentCache` import
   are gone. `01.init-ee.ts` keeps awaiting the bridge too — the
   memoized promise makes the double-await a no-op and removes the
   "filename-ordering is load-bearing" coupling.

Tests: two new regression cases in `tests/unit/deployment.test.ts`
under a `cache settlement (race-condition regression)` block. The first
asserts that a pre-settle `resolveDeployment()` call does not latch
`_cached` and that a post-settle call returns the correct ee/managed
shape. The second asserts the misconfig warning is silent before
settle and fires once after settle when ee/ is genuinely absent. Full
suite: 568/568 pass (was 566, +2 new).
@ABB65 ABB65 merged commit 2a26406 into main May 12, 2026
1 check passed
@ABB65 ABB65 deleted the fix/ee-bridge-init-race branch May 12, 2026 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant