Skip to content

Stale global ~/.forge/secrets.enc poisons the secrets provider chain #52

@initializ-mk

Description

@initializ-mk

Summary

OverlaySecretsToEnv and Runner.buildSecretProvider unconditionally append the global ~/.forge/secrets.enc provider to the chain whenever encrypted-file is configured — even when that file uses a different passphrase than the local one (or doesn't exist with the expected passphrase). When the global provider's ensureLoaded() fails, the resulting decrypt error is not ErrSecretNotFound, so ChainProvider.Get and ChainProvider.List propagate it immediately and poison the rest of the chain.

Concrete consequences:

  • ChainProvider.Get(key) for a key not in the local file falls through to the global file, fails to decrypt, and returns a real error — even though the local file is fine.
  • ChainProvider.List() (used by secretOverlayKeys after PR fix: overlay skill-declared secrets dynamically (closes #48) #51) bails out on the first errored provider, dropping every key the local file does enumerate. Builtin keys survive via the builtins-only fallback in secretOverlayKeys, but custom skill-declared keys are silently hidden until the user resolves the stale global file.

This is independent of #48 — it predates that fix and is just newly visible because of List() usage.

Root cause

forge-cli/runtime/runner.go:2440-2450 (inside Runner.buildSecretProvider):

case \"encrypted-file\":
    localPath := filepath.Join(r.cfg.WorkDir, \".forge\", \"secrets.enc\")
    providers = append(providers, secrets.NewEncryptedFileProvider(localPath, passCb))

    // Global fallback secrets file
    home, err := os.UserHomeDir()
    if err == nil {
        globalPath := filepath.Join(home, \".forge\", \"secrets.enc\")
        providers = append(providers, secrets.NewEncryptedFileProvider(globalPath, passCb))
    }

The same pattern repeats in OverlaySecretsToEnv at forge-cli/runtime/runner.go:2480-2490. Neither call site checks whether the global file exists before adding it to the chain, and the chain itself propagates decrypt errors strictly:

// forge-core/secrets/chain_provider.go:19-29 (Get)
if err == nil { return val, nil }
if !IsNotFound(err) { return \"\", err }  // decrypt failure propagates

// forge-core/secrets/chain_provider.go:48-64 (List)
keys, err := p.List()
if err != nil { return nil, err }          // first provider's failure poisons the whole list

Reproduction

  1. Set up an agent project with secrets.providers: [encrypted-file] and store a key locally:
    forge secrets set MY_KEY=local-value
    
  2. Create a global file with a different passphrase:
    FORGE_PASSPHRASE=other-pass forge secrets set --global SOME_KEY=foo
    
    (or just have any pre-existing ~/.forge/secrets.enc written under a different passphrase)
  3. Run forge run with the project's passphrase. Symptoms:
    • Runner.overlaySecrets logs only the builtin keys it could Get() from the local file; any non-builtin key in the local file is missed.
    • Skill scripts / cli_execute invocations don't see the custom env var even though forge secrets set accepted it.

Why the chain is strict by design

ChainProvider.Get returning real errors immediately is intentional — it surfaces a misconfigured provider rather than silently swallowing it. The brittleness comes from the call sites adding a provider that almost always errors, not from the chain semantics themselves.

Proposed fixes (pick one or combine)

  1. Gate the global path on os.Stat in both Runner.buildSecretProvider and OverlaySecretsToEnv. If the global file is absent, don't append it. (Doesn't help users whose global file exists but uses a different passphrase — but it covers the most common "never used global secrets" case at zero cost.)
  2. Try-decrypt the global file once at chain-build time. If ensureLoaded() fails, omit it from the chain and log a warning. Costs one Argon2id derivation at startup; gives correct behavior in both scenarios.
  3. Make ChainProvider.List and ChainProvider.Get lenient to per-provider failures: on error from a provider, log/skip and continue. This is a forge-core change and shifts semantics, so it's the most invasive option, but it's also the most robust — a misconfigured one-of-N provider would no longer block the others.
  4. Per-provider passphrase callbacks. Today the same passphraseFromEnv() callback is used for local and global. A future option would be to support distinct passphrases (e.g. FORGE_PASSPHRASE_GLOBAL) so the two stores can legitimately coexist with different keys.

(1)+(2) together get most of the value at low risk. (3) is the cleanest long-term fix but requires care around the existing strictness guarantee. (4) is a separate UX feature.

Affected files

  • forge-cli/runtime/runner.go:2440-2450Runner.buildSecretProvider, appends global unconditionally
  • forge-cli/runtime/runner.go:2480-2490OverlaySecretsToEnv, same pattern
  • forge-core/secrets/chain_provider.go:19-30ChainProvider.Get strict propagation
  • forge-core/secrets/chain_provider.go:48-65ChainProvider.List strict propagation
  • forge-cli/runtime/runner_secrets_test.go — the regression test for Skill-declared env vars stored as encrypted secrets are not passed to downstream binaries #48 already has to t.Setenv(\"HOME\", ...) to avoid this exact pitfall

Acceptance criteria

  • A user with a pre-existing global ~/.forge/secrets.enc (created with any passphrase) can run an agent in a project whose local secrets.enc uses a different passphrase, and all keys in the local file — builtin and custom — overlay correctly into the env.
  • OverlaySecretsToEnv and the runner's secret overlay log a clear warning when a provider in the chain fails to load, rather than silently dropping custom keys.
  • A failing global file does not prevent the local file's keys from being enumerated by ChainProvider.List.
  • Existing tests pass; the t.Setenv(\"HOME\", ...) isolation in TestOverlaySecretsToEnv_CustomSkillKey can be removed (the test no longer needs to defend against developer-local state).

Context

Surfaced during PR #51 (fix for #48). The fix landed despite this latent issue because secretOverlayKeys falls back to the builtin keys when List() errors — but that fallback only saves the LLM/channel keys, not the custom skill-declared keys that #48 was about. Closing this issue would let custom skill env vars work even when the user has a pre-existing global secrets file with a different passphrase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions