fix(env): live process.env reads + late-env detection for env-key providers#27719
Open
jerome-benoit wants to merge 91 commits into
Open
fix(env): live process.env reads + late-env detection for env-key providers#27719jerome-benoit wants to merge 91 commits into
jerome-benoit wants to merge 91 commits into
Conversation
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the environment service to read and write process.env directly so environment variables set after startup can be observed by runtime consumers.
Changes:
- Replaces the cached
Envstate with directprocess.envaccess. - Removes now-obsolete direct env writes/comments in config/provider code.
- Adds more test preload env cleanup entries for provider-related variables.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
packages/opencode/src/env/index.ts |
Changes Env service methods to proxy through process.env. |
packages/opencode/src/config/config.ts |
Removes redundant direct console token env assignment. |
packages/opencode/src/provider/provider.ts |
Removes obsolete TODO comments around direct env usage. |
packages/opencode/test/preload.ts |
Expands initial test env cleanup list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return Service.of({ get, all, set, remove }) | ||
| Service.of({ | ||
| get: Effect.fn("Env.get")((key: string) => Effect.sync(() => process.env[key])), | ||
| all: Effect.fn("Env.all")(() => Effect.sync(() => ({ ...process.env }) as State)), |
Comment on lines
+80
to
+89
| delete process.env["AICORE_SERVICE_KEY"] | ||
| delete process.env["AICORE_DEPLOYMENT_ID"] | ||
| delete process.env["AICORE_RESOURCE_GROUP"] | ||
| delete process.env["GOOGLE_APPLICATION_CREDENTIALS"] | ||
| delete process.env["CLOUDFLARE_ACCOUNT_ID"] | ||
| delete process.env["CLOUDFLARE_GATEWAY_ID"] | ||
| delete process.env["CLOUDFLARE_API_TOKEN"] | ||
| delete process.env["SINGLE_ENV_KEY"] | ||
| delete process.env["MULTI_ENV_KEY_1"] | ||
| delete process.env["FALLBACK_KEY"] |
Cover all env vars declared by providers in models-api.json plus the test-fixture pairs referenced in provider.test.ts, so host env (or in-process leaks via Env.set) cannot perturb provider detection.
…call Provider state cached via InstanceState read env.all() once at init, so providers whose env keys were set after the first list() call remained invisible (issue anomalyco#12698 surface beyond the Env layer). Overlay live env detection on top of the cached providers in list() and getProvider(), respecting disabled_providers, enabled_providers, blacklist, whitelist and model status filters via a precomputed envEligible set. Cached state retains expensive init work (catalog/plugins/SDK loaders); only env-driven detection re-runs.
Closes the half-fix where overlayEnv() was wired into list/getProvider only. getModel, getLanguage, closest, getSmallModel, defaultModel and resolveSDK all read s.providers directly, so a provider detected from a late env mutation appeared in list() but crashed when actually used. State refactor: - rename providers -> cachedProviders (invariant: never holds env-only entries) - replace database+envEligible with cleanedDatabase (env-eligible providers with models pre-filtered for status/blacklist/whitelist/chat-aliases AND non-autoload custom-loader options pre-folded) - drop the init-time env-stamp loop, subsumed by the call-time overlay Pure helpers (unit-testable): - resolveEnvOverlay(cached, candidate, apiKey): 7-branch env precedence - currentProviders(s, envs): single seam every reader funnels through Layer body: - view(s) helper composes env.all() + currentProviders - all 7 readers + resolveSDK go through view; resolveSDK accepts Info explicitly - custom loop always registers loaders so a late-env provider can find them; folds non-autoload options into database; preserves source=env for env-conditional autoloaders (gitlab, cloudflare-ai-gateway) Tests: - test/provider/overlay.test.ts: 13 pure-helper unit tests - test/provider/provider.test.ts: 9 integration tests covering getModel late, getLanguage late, closest late, getSmallModel late, defaultModel late, alpha model leak, blacklist late, multi-env Bedrock, auth precedence - test/preload.ts: clear GITHUB_TOKEN, GITLAB_TOKEN, OPENCODE_API_KEY, GEMINI_API_KEY, HF_TOKEN, DIGITALOCEAN_ACCESS_TOKEN to prevent CI leakage
…ers and cleanedDatabase, close variant parity gap Round-2 cross-validation by 3 oracles surfaced 4 actionable findings. This addresses all of them: A1 — Variant parity gap (Oracle A): the cleanedDatabase build only applied the 5 keep filters (status/blacklist/whitelist/chat-aliases) but skipped the prepareModel mutations (api.id default, ProviderTransform.variants generation, configProvider.models[*].variants merge) that the cachedProviders cleanup loop applies. A late-env-detected provider therefore had empty variants while the same provider with env-at-boot had full variants. Fixed by applying prepareModel in both loops. A3 — Filter duplication (Oracle A + C): the 5 keep predicates were duplicated between the cachedProviders cleanup loop and the cleanedDatabase build. Extracted keepModel(model, modelID, providerID, configProvider) and prepareModel(model, modelID, configVariants) as local helpers in the InstanceState.make closure. Both loops now share the pipeline. A2 — Test tightening (Oracle A + C): the late-detected multi-env Bedrock test had an 'if (bedrock)' guard that silently passed if regression made bedrock undefined. Replaced with explicit expect(bedrock).toBeDefined(). The defaultModel late-env test only asserted toBeDefined() — strengthened to expect(d.providerID).toBe(anthropic) and added enabled_providers config to make the assertion robust against bundled provider iteration order. A4 — AWS env hygiene (Oracle B): the multi-env Bedrock test only removed 4 AWS keys; added AWS_PROFILE, AWS_ROLE_ARN, AWS_CONTAINER_CREDENTIALS_*, AWS_WEB_IDENTITY_TOKEN_FILE for full provider-detection-path isolation. New regression test: 'late-detected env provider has variants populated (parity with env-at-boot)' guards A1 closure. bun typecheck: clean bun test test/provider/: 101 + 13 = 114/114 pass
…out + test isolation
Round-3 cross-validation by 4 oracles + 4 design oracles surfaced 8 findings
plus 3 audit gaps. This commit addresses all of them.
B1 — Multi-credential late-env gap (Oracle 2 + Oracle 3): cleanedDatabase
surfaces providers whose env appears post-init, but multi-cred loaders
(bedrock, sap-ai-core, vertex, vertex-anthropic, azure, cloudflare-workers-
ai, cloudflare-ai-gateway, azure-cognitive-services) close over credential
providers / deployment IDs / account IDs / derived URLs at loader time.
Late-promoting them yields an unauthenticated SDK that fails silently at
first request.
Solution: explicit `dynamicEnv?: boolean` capability flag on CustomLoader.
Loaders mark `dynamicEnv: false` on broken-init-state early-returns; State
holds `loaderRefusedDynamicEnv: Set<ProviderID>` populated at init;
cleanedDatabase build excludes these. New `warnLateEnvRefused` helper
emits a one-time per-process warn when refused env appears, surfacing
the limitation without false-positive listing. The PR test that asserted
the broken bedrock state is inverted to assert exclusion.
Audit (post-oracles): cloudflare-workers-ai, cloudflare-ai-gateway, and
azure-cognitive-services had the same pattern as the original 5 loaders
but were not covered. All three now carry the flag; azure-cognitive-
services refactored to early-return (was producing options.baseURL=
undefined). The dynamicEnv JSDoc states the rule generically so future
loaders know what to do.
B2 — Test isolation regression (Oracle 3): bun runs all .test.ts in one
shared process. With Env.set now writing through to process.env, 30+
test sites in test/provider/provider.test.ts call set() without remove()
and leak across files. Documented victim: test/session/structured-output-
integration.test.ts:12 reads process.env.ANTHROPIC_API_KEY at module-load
to choose skip vs live.
Solution: capture ENV_BASELINE in test/preload.ts after preload settles,
afterEach() restores the snapshot. ~15 lines, covers every test file
automatically, future-proof.
M1 — Inaccurate State invariants (Oracle 2 + Oracle 4): cachedProviders
comment claimed "never holds env-only entries", violated by the autoload-
with-env-key path. cleanedDatabase claimed "Immutable post-init" without
compile-time hint. Rephrased both; added Readonly<> on cleanedDatabase.
M2.C — Inconsistent process.env writes (Oracle 3): bedrock and sap-ai-core
loaders wrote process.env.X = auth.key directly, breaking the trace-and-
audit guarantee of the Env service. Added dep.set on CustomDep; lifted
the effectful write out of iife() into proper yield* dep.set(...).
M3 — Missing Env.Service contract docs (Oracle 2 + Oracle 4): added
top-of-file JSDoc covering live reads, global writes, child-process
inheritance, and the test-isolation contract.
M5 — Empty-string apiKey edge case (Oracle 2): info.env.find(Boolean)
intentionally treats "" as absent. Added inline comment so a future
"correctness fix" doesn't regress to find(v => v !== undefined).
M6 — Redundant `as State` cast (Oracle 1): {...process.env} infers
correctly as Record<string, string | undefined>. Cast removed.
bun typecheck: clean
bun test test/provider: 365/366 (1 environmental 5000ms flake, different
test each run, present on baseline before changes)
bun test test/plugin: 91/91
Round-4 cross-validation by 5 review oracles + 2 design oracles surfaced 5
in-scope findings on top of the late-env contract. This commit addresses
all of them.
B1 — getLanguage TypeError on env removal (Oracle correctness): when the
env key is removed between getModel() and getLanguage(), currentProviders
yields no entry for the providerID and resolveSDK / spread on
provider.options crashes inside refineRejection (which only refines
NoSuchModelError). Surfaces as an unrelated defect.
Solution: early-return ModelNotFoundError when the overlay has no provider,
mirroring getModel's existing pattern. fuzzysort suggestions are populated
identically so the error contract is consistent across the two readers.
B2 — s.models cache stale on env-key rotation (Oracle correctness): s.sdk
correctly rebuilds via Hash.fast({providerID, npm, options}) when apiKey
changes, but the cached LanguageModelV3 wrapper at s.models was keyed only
by ${providerID}/${modelID}, so it kept referencing the SDK from before
rotation. Multi-env providers (Databricks, Privatemode, Google generative
AI) where provider.key is undefined had an even worse collision: two
distinct env states produced identical cache keys.
Solution: cache key now mirrors s.sdk's Hash.fast keying:
Hash.fast(JSON.stringify({providerID, modelID, key, options})). Env key
rotation, multi-env credential changes, and env-templated baseURL
mutations now invalidate the LanguageModel together with the SDK by
construction.
M1 — Direct process.env write in config.ts well-known auth (Explore §4.2):
`process.env[value.key] = value.token` bypassed the typed Env seam.
Replaced with `yield* env.set(...)` matching the existing pattern at the
OPENCODE_CONSOLE_TOKEN site. No behavioral change post-PR (Env.set now
writes through), pure consistency.
M2 — Mixed env reads in bedrock loader (Explore §4.3):
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI / _FULL_URI were read directly
from process.env while surrounding code uses the dep.env() snapshot.
Switched to env["..."] for greppability and to keep all bedrock-loader
env reads going through one source.
M3 — Multi-instance hazard documentation (Oracle architecture, Oracle
scope/quality): Env.set writes are process-global. In multi-workspace
desktop hosts (one sidecar serving several workspace contexts) two
workspaces authenticating to the same provider with different credentials
clobber each other. Added explicit warning to the Env module docstring,
plus TODO(multi-instance) markers at each existing credential-write site
(config.ts well-known + console-token, provider.ts bedrock-bearer +
sap-ai-core-service-key) so the future Auth-routing migration has a clean
search target.
Tests:
- getLanguage rejects when env removed between getModel and getLanguage
(covers B1)
- getLanguage rebuilds language model after env key rotation (covers B2)
Verification:
- bun typecheck: clean
- lsp_diagnostics: clean on all 4 changed files
- bun test test/provider/: 368 pass / 0 fail (incl. 2 new + all PR
late-env tests + overlay.test.ts pure unit tests)
- bun test test/config/: 163 pass / 0 fail
Deferred to separate tickets (out of scope for this PR per the design
proposal cross-validated by both oracles):
- Multi-env partial-env leak (Databricks/Privatemode/Google generative AI):
pre-existing in the deleted load-env loop with identical semantics; no
safe automatic rule without catalog metadata distinguishing alias vs
complementary multi-env. Fix candidate: default dynamicEnv:false for
multi-env providers without custom loaders.
- dynamicEnv:false extension to autoload-success providers (warn on
secondary env mutation post-init).
- mcp-websearch.ts EXA_URL module-level snapshot.
- Multi-instance credential isolation rework (Auth-routing migration).
- resolveEnvOverlay: preserve cached.key when candidate is multi-env so a previously-resolved env key is not dropped to undefined when the env layer is later re-evaluated. - hashIdentity: replace bare `Hash.fast(JSON.stringify(...))` with a helper that maps function values to a named-tag sentinel. JSON.stringify silently drops functions, so two distinct closures stored in options (e.g. AWS credentialProvider instances) collided on the same hash. Apply at the resolveSDK and getLanguage call sites. - Mark resolveEnvOverlay, currentProviders, and the bare `layer` export as @internal in their JSDoc so consumers don't depend on them; production code MUST go through Provider.defaultLayer / Provider.Service methods. - Document the deep-clone invariant of toPublicInfo+mergeDeep so the prepareModel call inside both providers and cleanedDatabase loops is provably safe (each loop mutates its own clone). - test/preload.ts: add AZURE_COGNITIVE_SERVICES_RESOURCE_NAME and GITLAB_INSTANCE_URL to the deletion list (referenced by the azure cognitive services and gitlab loaders). - test/preload.ts: capture ENV_BASELINE after Log.init() and initProjectors() so any import-time side effects are part of the baseline. Avoids afterEach mistakenly deleting keys set by src/ module evaluation. - Extend overlay.test.ts with the missing multi-env preserve-key case.
…acing, rerunOn TODO Address cross-validation findings from 4-oracle review: - env: drop Effect.fn spans on get/all (hot path; every Provider read goes through env.all()). Keep tracing on set/remove where the side effect is observable. - provider: tighten hashIdentity comment to admit that anonymous arrows still collide on __fn:anon, and explain why that permissiveness is intentional (per-call fetch wrapper in resolveSDK must hit the SDK cache instead of busting it). - provider: simplify `cached?.key ?? undefined` to `cached?.key` in resolveEnvOverlay (redundant with optional chaining). - provider: add TODO(rerunOn) marker in dynamicEnv JSDoc as the future declarative env-dependency list that would close the sap-ai-core / amazon-bedrock late-env case without restart. No behavioral change. typecheck + 465 tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #12698
Type of change
What does this PR do?
Env.all()previously snapshottedprocess.envat init time viaInstanceState, so env vars set after init were never detected byEnv.get/Env.all, andEnv.setonly mutated the internal copy (neverprocess.env).Core fix (env/index.ts)
Replaces the snapshot with
Layer.succeed+Effect.syncthat proxies directly toprocess.env:get(key)→Effect.sync(() => process.env[key])— live readall()→Effect.sync(() => ({ ...process.env }))— fresh shallow copy each callset(key, value)→ writes toprocess.envdirectlyremove(key)→ deletes fromprocess.envdirectlyEnv.Interfaceis unchanged; no consumer modifications required.Provider-side changes (provider.ts)
The env/index.ts fix alone is not sufficient because
provider.list()was returning a frozen snapshot built at init time. Late env vars now flow through:cleanedDatabase: env-eligible providers with models pre-filtered, computed once at init.currentProviders(state, envs)/resolveEnvOverlay: pure functions that produce the live providers map by overlaying the live env on top of the init-timecachedProviders. All public read paths (list,getProvider,getModel,getLanguage,closest,getSmallModel,defaultModel) go through this overlay.getLanguagecache key ishashIdentity({providerID, modelID, key, options})so a rotated env key invalidates the cachedLanguageModel.hashIdentitymaps function-valued options to a name-tagged sentinel so two distinct closures (e.g. AWScredentialProviderinstances) don't silently collide.dynamicEnv: falseopt-out (multi-env-var providers)Providers whose options derive from multiple env vars (
amazon-bedrock,sap-ai-core,azure,azure-cognitive-services,google-vertex,cloudflare-ai-gateway,cloudflare-workers-ai) cannot be reconstituted from env alone — their loaders close over auth metadata, dynamic SDK imports, region/deployment IDs, etc. These returndynamicEnv: falseand stay excluded fromcleanedDatabase. When their env later appears, a one-timewarnis emitted and the provider stays hidden until OpenCode restart.This means the original reproducer of #12698 (setting
AICORE_SERVICE_KEYafter init) still requires a restart. The user now gets a clear warning rather than silence. Single-env-key providers (anthropic, openai, openrouter, etc.) work fully without restart, which is the primary class of plugins triggering #12698 in practice.Test isolation
Because
Env.setnow mutates the globalprocess.env,test/preload.tsadds anafterEachsnapshot/restore so per-test isolation is automatic regardless of contributor discipline. TheENV_BASELINEis captured afterLog.init()andinitProjectors()so any import-time side effects are part of the baseline. The deletion list is extended for the env vars referenced by this PR's loaders (AICORE_, AWS_CONTAINER_, AWS_WEB_IDENTITY_, AWS_ROLE_ARN, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_VERTEX_, CLOUDFLARE_*, AZURE_COGNITIVE_SERVICES_RESOURCE_NAME, GITLAB_INSTANCE_URL).Known limitations
Env.setwrites process-wide. In multi-instance hosts (e.g. desktop sidecar serving several workspaces) two workspaces authenticating to the same provider with different credentials clobber each other — last write wins. TheTODO(multi-instance)comments insrc/provider/provider.tsandsrc/config/config.tsmark the call sites that persist auth-derived values throughEnv.set. Fixing this requires routing per-workspace credentials directly into SDK constructors instead of viaprocess.env; that is a separate, larger refactor and out of scope for this PR.dynamicEnv: falsesection above. A future enhancement could allow loaders to declarererunOn: string[]so the loader is re-evaluated when one of its env inputs changes; this would coversap-ai-core,amazon-bedrock, etc. Out of scope for this PR.Internal API surface
resolveEnvOverlay,currentProviders, and the barelayerexport fromprovider.tsare tagged@internal— they exist only so the test suite can build deterministic overlay scenarios and stubRuntimeFlags. Production code MUST useProvider.defaultLayerand theProvider.Servicemethods. They are subject to change without notice.How did you verify your code works?
bun typecheckpassesbun test packages/opencode/test/provider/provider.test.ts— 103 passbun test packages/opencode/test/provider/overlay.test.ts— 14 pass (covers everyresolveEnvOverlaycase, including the multi-env-candidate cached-key preserve case)bun test packages/opencode/test/provider/transform.test.ts packages/opencode/test/provider/model-status.test.ts packages/opencode/test/provider/digitalocean.test.ts— 230 passbun test packages/opencode/test/effect/runtime-flags.test.ts packages/opencode/test/config/config.test.ts— 118 passenv.all()/env.get()consumer inprovider.ts,config.ts,tool/registry.ts— all are read-only and unaffected by snapshot → live-proxyANTHROPIC_API_KEY, set it after init, calledprovider.list()—anthropicnow appears inconnected(failed before fix)Screenshots / recordings
N/A — no UI change.
Checklist