fix(env): live process.env reads + late-env detection for env-key providers by jerome-benoit · Pull Request #27719 · anomalyco/opencode

jerome-benoit · 2026-05-15T10:45:43Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Env.all() previously snapshotted process.env at init time via InstanceState, so env vars set after init were never detected by Env.get/Env.all, and Env.set only mutated the internal copy (never process.env).

Core fix (env/index.ts)

Replaces the snapshot with Layer.succeed + Effect.sync that proxies directly to process.env:

get(key) → Effect.sync(() => process.env[key]) — live read
all() → Effect.sync(() => ({ ...process.env })) — fresh shallow copy each call
set(key, value) → writes to process.env directly
remove(key) → deletes from process.env directly

Env.Interface is unchanged; no consumer modifications required.

Provider-side changes (provider.ts)

The env/index.ts fix alone is not sufficient because provider.list() was returning a frozen snapshot built at init time. Late env vars now flow through:

cleanedDatabase: env-eligible providers with models pre-filtered, computed once at init.
currentProviders(state, envs) / resolveEnvOverlay: pure functions that produce the live providers map by overlaying the live env on top of the init-time cachedProviders. All public read paths (list, getProvider, getModel, getLanguage, closest, getSmallModel, defaultModel) go through this overlay.
getLanguage cache key is hashIdentity({providerID, modelID, key, options}) so a rotated env key invalidates the cached LanguageModel. hashIdentity maps function-valued options to a name-tagged sentinel so two distinct closures (e.g. AWS credentialProvider instances) don't silently collide.

`dynamicEnv: false` opt-out (multi-env-var providers)

Providers whose options derive from multiple env vars (amazon-bedrock, sap-ai-core, azure, azure-cognitive-services, google-vertex, cloudflare-ai-gateway, cloudflare-workers-ai) cannot be reconstituted from env alone — their loaders close over auth metadata, dynamic SDK imports, region/deployment IDs, etc. These return dynamicEnv: false and stay excluded from cleanedDatabase. When their env later appears, a one-time warn is emitted and the provider stays hidden until OpenCode restart.

This means the original reproducer of #12698 (setting AICORE_SERVICE_KEY after init) still requires a restart. The user now gets a clear warning rather than silence. Single-env-key providers (anthropic, openai, openrouter, etc.) work fully without restart, which is the primary class of plugins triggering #12698 in practice.

Test isolation

Because Env.set now mutates the global process.env, test/preload.ts adds an afterEach snapshot/restore so per-test isolation is automatic regardless of contributor discipline. The ENV_BASELINE is captured after Log.init() and initProjectors() so any import-time side effects are part of the baseline. The deletion list is extended for the env vars referenced by this PR's loaders (AICORE_, AWS_CONTAINER_, AWS_WEB_IDENTITY_, AWS_ROLE_ARN, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_VERTEX_, CLOUDFLARE_*, AZURE_COGNITIVE_SERVICES_RESOURCE_NAME, GITLAB_INSTANCE_URL).

Known limitations

Multi-instance credential clobbering: Env.set writes process-wide. In multi-instance hosts (e.g. desktop sidecar serving several workspaces) two workspaces authenticating to the same provider with different credentials clobber each other — last write wins. The TODO(multi-instance) comments in src/provider/provider.ts and src/config/config.ts mark the call sites that persist auth-derived values through Env.set. Fixing this requires routing per-workspace credentials directly into SDK constructors instead of via process.env; that is a separate, larger refactor and out of scope for this PR.
Multi-env-var providers still need restart: see dynamicEnv: false section above. A future enhancement could allow loaders to declare rerunOn: string[] so the loader is re-evaluated when one of its env inputs changes; this would cover sap-ai-core, amazon-bedrock, etc. Out of scope for this PR.

Internal API surface

resolveEnvOverlay, currentProviders, and the bare layer export from provider.ts are tagged @internal — they exist only so the test suite can build deterministic overlay scenarios and stub RuntimeFlags. Production code MUST use Provider.defaultLayer and the Provider.Service methods. They are subject to change without notice.

How did you verify your code works?

bun typecheck passes
bun test packages/opencode/test/provider/provider.test.ts — 103 pass
bun test packages/opencode/test/provider/overlay.test.ts — 14 pass (covers every resolveEnvOverlay case, including the multi-env-candidate cached-key preserve case)
bun test packages/opencode/test/provider/transform.test.ts packages/opencode/test/provider/model-status.test.ts packages/opencode/test/provider/digitalocean.test.ts — 230 pass
bun test packages/opencode/test/effect/runtime-flags.test.ts packages/opencode/test/config/config.test.ts — 118 pass
Reviewed every env.all() / env.get() consumer in provider.ts, config.ts, tool/registry.ts — all are read-only and unaffected by snapshot → live-proxy
Reproduced the original issue: started without ANTHROPIC_API_KEY, set it after init, called provider.list() — anthropic now appears in connected (failed before fix)

Screenshots / recordings

N/A — no UI change.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

github-actions · 2026-05-15T10:47:23Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Copilot

Pull request overview

This PR updates the environment service to read and write process.env directly so environment variables set after startup can be observed by runtime consumers.

Changes:

Replaces the cached Env state with direct process.env access.
Removes now-obsolete direct env writes/comments in config/provider code.
Adds more test preload env cleanup entries for provider-related variables.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`packages/opencode/src/env/index.ts`	Changes Env service methods to proxy through `process.env`.
`packages/opencode/src/config/config.ts`	Removes redundant direct console token env assignment.
`packages/opencode/src/provider/provider.ts`	Removes obsolete TODO comments around direct env usage.
`packages/opencode/test/preload.ts`	Expands initial test env cleanup list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

-    return Service.of({ get, all, set, remove })
+  Service.of({
+    get: Effect.fn("Env.get")((key: string) => Effect.sync(() => process.env[key])),
+    all: Effect.fn("Env.all")(() => Effect.sync(() => ({ ...process.env }) as State)),


+delete process.env["AICORE_SERVICE_KEY"]
+delete process.env["AICORE_DEPLOYMENT_ID"]
+delete process.env["AICORE_RESOURCE_GROUP"]
+delete process.env["GOOGLE_APPLICATION_CREDENTIALS"]
+delete process.env["CLOUDFLARE_ACCOUNT_ID"]
+delete process.env["CLOUDFLARE_GATEWAY_ID"]
+delete process.env["CLOUDFLARE_API_TOKEN"]
+delete process.env["SINGLE_ENV_KEY"]
+delete process.env["MULTI_ENV_KEY_1"]
+delete process.env["FALLBACK_KEY"]


Cover all env vars declared by providers in models-api.json plus the test-fixture pairs referenced in provider.test.ts, so host env (or in-process leaks via Env.set) cannot perturb provider detection.

…call Provider state cached via InstanceState read env.all() once at init, so providers whose env keys were set after the first list() call remained invisible (issue anomalyco#12698 surface beyond the Env layer). Overlay live env detection on top of the cached providers in list() and getProvider(), respecting disabled_providers, enabled_providers, blacklist, whitelist and model status filters via a precomputed envEligible set. Cached state retains expensive init work (catalog/plugins/SDK loaders); only env-driven detection re-runs.

Closes the half-fix where overlayEnv() was wired into list/getProvider only. getModel, getLanguage, closest, getSmallModel, defaultModel and resolveSDK all read s.providers directly, so a provider detected from a late env mutation appeared in list() but crashed when actually used. State refactor: - rename providers -> cachedProviders (invariant: never holds env-only entries) - replace database+envEligible with cleanedDatabase (env-eligible providers with models pre-filtered for status/blacklist/whitelist/chat-aliases AND non-autoload custom-loader options pre-folded) - drop the init-time env-stamp loop, subsumed by the call-time overlay Pure helpers (unit-testable): - resolveEnvOverlay(cached, candidate, apiKey): 7-branch env precedence - currentProviders(s, envs): single seam every reader funnels through Layer body: - view(s) helper composes env.all() + currentProviders - all 7 readers + resolveSDK go through view; resolveSDK accepts Info explicitly - custom loop always registers loaders so a late-env provider can find them; folds non-autoload options into database; preserves source=env for env-conditional autoloaders (gitlab, cloudflare-ai-gateway) Tests: - test/provider/overlay.test.ts: 13 pure-helper unit tests - test/provider/provider.test.ts: 9 integration tests covering getModel late, getLanguage late, closest late, getSmallModel late, defaultModel late, alpha model leak, blacklist late, multi-env Bedrock, auth precedence - test/preload.ts: clear GITHUB_TOKEN, GITLAB_TOKEN, OPENCODE_API_KEY, GEMINI_API_KEY, HF_TOKEN, DIGITALOCEAN_ACCESS_TOKEN to prevent CI leakage

…ers and cleanedDatabase, close variant parity gap Round-2 cross-validation by 3 oracles surfaced 4 actionable findings. This addresses all of them: A1 — Variant parity gap (Oracle A): the cleanedDatabase build only applied the 5 keep filters (status/blacklist/whitelist/chat-aliases) but skipped the prepareModel mutations (api.id default, ProviderTransform.variants generation, configProvider.models[*].variants merge) that the cachedProviders cleanup loop applies. A late-env-detected provider therefore had empty variants while the same provider with env-at-boot had full variants. Fixed by applying prepareModel in both loops. A3 — Filter duplication (Oracle A + C): the 5 keep predicates were duplicated between the cachedProviders cleanup loop and the cleanedDatabase build. Extracted keepModel(model, modelID, providerID, configProvider) and prepareModel(model, modelID, configVariants) as local helpers in the InstanceState.make closure. Both loops now share the pipeline. A2 — Test tightening (Oracle A + C): the late-detected multi-env Bedrock test had an 'if (bedrock)' guard that silently passed if regression made bedrock undefined. Replaced with explicit expect(bedrock).toBeDefined(). The defaultModel late-env test only asserted toBeDefined() — strengthened to expect(d.providerID).toBe(anthropic) and added enabled_providers config to make the assertion robust against bundled provider iteration order. A4 — AWS env hygiene (Oracle B): the multi-env Bedrock test only removed 4 AWS keys; added AWS_PROFILE, AWS_ROLE_ARN, AWS_CONTAINER_CREDENTIALS_*, AWS_WEB_IDENTITY_TOKEN_FILE for full provider-detection-path isolation. New regression test: 'late-detected env provider has variants populated (parity with env-at-boot)' guards A1 closure. bun typecheck: clean bun test test/provider/: 101 + 13 = 114/114 pass

…out + test isolation Round-3 cross-validation by 4 oracles + 4 design oracles surfaced 8 findings plus 3 audit gaps. This commit addresses all of them. B1 — Multi-credential late-env gap (Oracle 2 + Oracle 3): cleanedDatabase surfaces providers whose env appears post-init, but multi-cred loaders (bedrock, sap-ai-core, vertex, vertex-anthropic, azure, cloudflare-workers- ai, cloudflare-ai-gateway, azure-cognitive-services) close over credential providers / deployment IDs / account IDs / derived URLs at loader time. Late-promoting them yields an unauthenticated SDK that fails silently at first request. Solution: explicit `dynamicEnv?: boolean` capability flag on CustomLoader. Loaders mark `dynamicEnv: false` on broken-init-state early-returns; State holds `loaderRefusedDynamicEnv: Set<ProviderID>` populated at init; cleanedDatabase build excludes these. New `warnLateEnvRefused` helper emits a one-time per-process warn when refused env appears, surfacing the limitation without false-positive listing. The PR test that asserted the broken bedrock state is inverted to assert exclusion. Audit (post-oracles): cloudflare-workers-ai, cloudflare-ai-gateway, and azure-cognitive-services had the same pattern as the original 5 loaders but were not covered. All three now carry the flag; azure-cognitive- services refactored to early-return (was producing options.baseURL= undefined). The dynamicEnv JSDoc states the rule generically so future loaders know what to do. B2 — Test isolation regression (Oracle 3): bun runs all .test.ts in one shared process. With Env.set now writing through to process.env, 30+ test sites in test/provider/provider.test.ts call set() without remove() and leak across files. Documented victim: test/session/structured-output- integration.test.ts:12 reads process.env.ANTHROPIC_API_KEY at module-load to choose skip vs live. Solution: capture ENV_BASELINE in test/preload.ts after preload settles, afterEach() restores the snapshot. ~15 lines, covers every test file automatically, future-proof. M1 — Inaccurate State invariants (Oracle 2 + Oracle 4): cachedProviders comment claimed "never holds env-only entries", violated by the autoload- with-env-key path. cleanedDatabase claimed "Immutable post-init" without compile-time hint. Rephrased both; added Readonly<> on cleanedDatabase. M2.C — Inconsistent process.env writes (Oracle 3): bedrock and sap-ai-core loaders wrote process.env.X = auth.key directly, breaking the trace-and- audit guarantee of the Env service. Added dep.set on CustomDep; lifted the effectful write out of iife() into proper yield* dep.set(...). M3 — Missing Env.Service contract docs (Oracle 2 + Oracle 4): added top-of-file JSDoc covering live reads, global writes, child-process inheritance, and the test-isolation contract. M5 — Empty-string apiKey edge case (Oracle 2): info.env.find(Boolean) intentionally treats "" as absent. Added inline comment so a future "correctness fix" doesn't regress to find(v => v !== undefined). M6 — Redundant `as State` cast (Oracle 1): {...process.env} infers correctly as Record<string, string | undefined>. Cast removed. bun typecheck: clean bun test test/provider: 365/366 (1 environmental 5000ms flake, different test each run, present on baseline before changes) bun test test/plugin: 91/91

Round-4 cross-validation by 5 review oracles + 2 design oracles surfaced 5 in-scope findings on top of the late-env contract. This commit addresses all of them. B1 — getLanguage TypeError on env removal (Oracle correctness): when the env key is removed between getModel() and getLanguage(), currentProviders yields no entry for the providerID and resolveSDK / spread on provider.options crashes inside refineRejection (which only refines NoSuchModelError). Surfaces as an unrelated defect. Solution: early-return ModelNotFoundError when the overlay has no provider, mirroring getModel's existing pattern. fuzzysort suggestions are populated identically so the error contract is consistent across the two readers. B2 — s.models cache stale on env-key rotation (Oracle correctness): s.sdk correctly rebuilds via Hash.fast({providerID, npm, options}) when apiKey changes, but the cached LanguageModelV3 wrapper at s.models was keyed only by ${providerID}/${modelID}, so it kept referencing the SDK from before rotation. Multi-env providers (Databricks, Privatemode, Google generative AI) where provider.key is undefined had an even worse collision: two distinct env states produced identical cache keys. Solution: cache key now mirrors s.sdk's Hash.fast keying: Hash.fast(JSON.stringify({providerID, modelID, key, options})). Env key rotation, multi-env credential changes, and env-templated baseURL mutations now invalidate the LanguageModel together with the SDK by construction. M1 — Direct process.env write in config.ts well-known auth (Explore §4.2): `process.env[value.key] = value.token` bypassed the typed Env seam. Replaced with `yield* env.set(...)` matching the existing pattern at the OPENCODE_CONSOLE_TOKEN site. No behavioral change post-PR (Env.set now writes through), pure consistency. M2 — Mixed env reads in bedrock loader (Explore §4.3): AWS_CONTAINER_CREDENTIALS_RELATIVE_URI / _FULL_URI were read directly from process.env while surrounding code uses the dep.env() snapshot. Switched to env["..."] for greppability and to keep all bedrock-loader env reads going through one source. M3 — Multi-instance hazard documentation (Oracle architecture, Oracle scope/quality): Env.set writes are process-global. In multi-workspace desktop hosts (one sidecar serving several workspace contexts) two workspaces authenticating to the same provider with different credentials clobber each other. Added explicit warning to the Env module docstring, plus TODO(multi-instance) markers at each existing credential-write site (config.ts well-known + console-token, provider.ts bedrock-bearer + sap-ai-core-service-key) so the future Auth-routing migration has a clean search target. Tests: - getLanguage rejects when env removed between getModel and getLanguage (covers B1) - getLanguage rebuilds language model after env key rotation (covers B2) Verification: - bun typecheck: clean - lsp_diagnostics: clean on all 4 changed files - bun test test/provider/: 368 pass / 0 fail (incl. 2 new + all PR late-env tests + overlay.test.ts pure unit tests) - bun test test/config/: 163 pass / 0 fail Deferred to separate tickets (out of scope for this PR per the design proposal cross-validated by both oracles): - Multi-env partial-env leak (Databricks/Privatemode/Google generative AI): pre-existing in the deleted load-env loop with identical semantics; no safe automatic rule without catalog metadata distinguishing alias vs complementary multi-env. Fix candidate: default dynamicEnv:false for multi-env providers without custom loaders. - dynamicEnv:false extension to autoload-success providers (warn on secondary env mutation post-init). - mcp-websearch.ts EXA_URL module-level snapshot. - Multi-instance credential isolation rework (Auth-routing migration).

- resolveEnvOverlay: preserve cached.key when candidate is multi-env so a previously-resolved env key is not dropped to undefined when the env layer is later re-evaluated. - hashIdentity: replace bare `Hash.fast(JSON.stringify(...))` with a helper that maps function values to a named-tag sentinel. JSON.stringify silently drops functions, so two distinct closures stored in options (e.g. AWS credentialProvider instances) collided on the same hash. Apply at the resolveSDK and getLanguage call sites. - Mark resolveEnvOverlay, currentProviders, and the bare `layer` export as @internal in their JSDoc so consumers don't depend on them; production code MUST go through Provider.defaultLayer / Provider.Service methods. - Document the deep-clone invariant of toPublicInfo+mergeDeep so the prepareModel call inside both providers and cleanedDatabase loops is provably safe (each loop mutates its own clone). - test/preload.ts: add AZURE_COGNITIVE_SERVICES_RESOURCE_NAME and GITLAB_INSTANCE_URL to the deletion list (referenced by the azure cognitive services and gitlab loaders). - test/preload.ts: capture ENV_BASELINE after Log.init() and initProjectors() so any import-time side effects are part of the baseline. Avoids afterEach mistakenly deleting keys set by src/ module evaluation. - Extend overlay.test.ts with the missing multi-env preserve-key case.

…acing, rerunOn TODO Address cross-validation findings from 4-oracle review: - env: drop Effect.fn spans on get/all (hot path; every Provider read goes through env.all()). Keep tracing on set/remove where the side effect is observable. - provider: tighten hashIdentity comment to admit that anonymous arrows still collide on __fn:anon, and explain why that permissiveness is intentional (per-call fetch wrapper in resolveSDK must hit the SDK cache instead of busting it). - provider: simplify `cached?.key ?? undefined` to `cached?.key` in resolveEnvOverlay (redundant with optional chaining). - provider: add TODO(rerunOn) marker in dynamicEnv JSDoc as the future declarative env-dependency list that would close the sap-ai-core / amazon-bedrock late-env case without restart. No behavioral change. typecheck + 465 tests pass.

jerome-benoit added 30 commits April 27, 2026 16:41

fix(env): proxy directly to process.env instead of snapshotting

bf21d10

Merge branch 'dev' into fix-env-caching-12698

87ab94a

Merge branch 'dev' into fix-env-caching-12698

6b29710

Merge branch 'dev' into fix-env-caching-12698

6d9a331

Merge branch 'dev' into fix-env-caching-12698

f94cdf5

Merge branch 'dev' into fix-env-caching-12698

6c7c591

Merge branch 'dev' into fix-env-caching-12698

3deec72

Merge branch 'dev' into fix-env-caching-12698

69a6dbd

Merge branch 'dev' into fix-env-caching-12698

dbb73f4

Merge branch 'dev' into fix-env-caching-12698

d506d6a

Merge branch 'dev' into fix-env-caching-12698

7186b14

Merge branch 'dev' into fix-env-caching-12698

e6c72c9

Merge branch 'dev' into fix-env-caching-12698

0237fa9

Merge branch 'dev' into fix-env-caching-12698

7702475

Merge branch 'dev' into fix-env-caching-12698

84b46ff

Merge branch 'dev' into fix-env-caching-12698

45b50db

Merge branch 'dev' into fix-env-caching-12698

0ae35e0

Merge branch 'dev' into fix-env-caching-12698

873795f

Merge branch 'dev' into fix-env-caching-12698

414f6fb

Merge branch 'dev' into fix-env-caching-12698

f2efe23

Merge branch 'dev' into fix-env-caching-12698

301e91c

Merge branch 'dev' into fix-env-caching-12698

a8220d0

Merge branch 'dev' into fix-env-caching-12698

9e11919

Merge branch 'dev' into fix-env-caching-12698

6916a66

Merge branch 'dev' into fix-env-caching-12698

ecf064a

Merge branch 'dev' into fix-env-caching-12698

5364150

Merge branch 'dev' into fix-env-caching-12698

71700a9

Merge branch 'dev' into fix-env-caching-12698

5be5f9e

Merge branch 'dev' into fix-env-caching-12698

627fae5

Merge branch 'dev' into fix-env-caching-12698

8ba4d4a

jerome-benoit added 13 commits May 13, 2026 18:43

Merge branch 'dev' into fix-env-caching-12698

2b8901c

Merge branch 'dev' into fix-env-caching-12698

eba83f7

Merge branch 'dev' into fix-env-caching-12698

cab01a8

Merge branch 'dev' into fix-env-caching-12698

7df73af

Merge branch 'dev' into fix-env-caching-12698

e328eee

Merge branch 'dev' into fix-env-caching-12698

27fd04e

Merge branch 'dev' into fix-env-caching-12698

cd84dbf

Merge branch 'dev' into fix-env-caching-12698

20c5453

Merge branch 'dev' into fix-env-caching-12698

8989c59

Merge branch 'dev' into fix-env-caching-12698

9ab890d

Merge branch 'dev' into fix-env-caching-12698

5d41d24

Merge branch 'dev' into fix-env-caching-12698

cec558e

Merge branch 'dev' into fix-env-caching-12698

6315ffc

Copilot AI review requested due to automatic review settings May 15, 2026 10:45

github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. contributor labels May 15, 2026

Copilot started reviewing on behalf of jerome-benoit May 15, 2026 10:46 View session

github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 15, 2026

Copilot AI reviewed May 15, 2026

View reviewed changes

jerome-benoit added 8 commits May 15, 2026 12:51

Merge branch 'dev' into fix-env-caching-12698

db64ec1

test(preload): clear remaining provider-detection env vars

9b25e2b

Cover all env vars declared by providers in models-api.json plus the test-fixture pairs referenced in provider.test.ts, so host env (or in-process leaks via Env.set) cannot perturb provider detection.

jerome-benoit changed the title ~~fix(env): proxy directly to process.env instead of snapshotting~~ fix(env): live process.env reads + late-env detection for env-key providers May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(env): live process.env reads + late-env detection for env-key providers#27719

fix(env): live process.env reads + late-env detection for env-key providers#27719
jerome-benoit wants to merge 91 commits into
anomalyco:devfrom
jerome-benoit:fix-env-caching-12698

jerome-benoit commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jerome-benoit commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

Core fix (env/index.ts)

Provider-side changes (provider.ts)

dynamicEnv: false opt-out (multi-env-var providers)

Test isolation

Known limitations

Internal API surface

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerome-benoit commented May 15, 2026 •

edited

Loading

`dynamicEnv: false` opt-out (multi-env-var providers)