Skip to content

[wrangler] fix: prevent remote binding sessions expiring during long dev sessions#13016

Draft
penalosa wants to merge 2 commits intomainfrom
penalosa/remote-bindings-timeout
Draft

[wrangler] fix: prevent remote binding sessions expiring during long dev sessions#13016
penalosa wants to merge 2 commits intomainfrom
penalosa/remote-bindings-timeout

Conversation

@penalosa
Copy link
Contributor

@penalosa penalosa commented Mar 23, 2026

Fixes #12901

Fixes an issue where remote bindings (e.g. Workers AI) would stop working after ~1 hour in wrangler dev, requiring a manual restart.

Root causes fixed:

  1. Preview tokens expired silently — tokens have a 1-hour TTL set by the Workers control plane. The ProxyWorker would detect expiry reactively (only after a request failed), meaning one request would always fail before a refresh was triggered. This PR adds a proactive refresh at 50 minutes so no request ever sees an expired token.

  2. error code: 1031 not recognised as expiry — bindings like Workers AI return text/plain error code: 1031 when their session times out, but the existing check only matched the HTML Invalid Workers Preview configuration response. The check is now content-type-agnostic.

  3. Failed refresh left ProxyWorker permanently paused — if #refreshPreviewToken threw after emitReloadStartEvent had already paused the ProxyWorker, no reloadComplete was ever emitted to unpause it. The catch block now re-emits the last known proxyData to recover.

  4. Auth resolved too earlyunwrapHook(dev.auth) was called at bundle-complete time and the resolved value stored. This means OAuth tokens captured at startup could go stale. Auth is now resolved lazily each time a session starts.

  5. Handler reserved keys proxied to remote stub — workerd probes service binding entrypoints for lifecycle handler methods (tail, trace, scheduled, etc.) by accessing those properties. Without a guard these fell through to the capnweb stub, returning truthy RpcPromise objects and making workerd think the binding handled those events. Added a HANDLER_RESERVED_KEYS guard matching the existing pattern in external-service.ts.


  • Tests
    • Tests included/updated
    • Automated tests not possible - manual testing has been completed as follows:
    • Additional testing not necessary because:
  • Public documentation
    • Cloudflare docs PR(s):
    • Documentation not necessary because: internal dev tooling behaviour fix

Open with Devin

@changeset-bot
Copy link

changeset-bot bot commented Mar 23, 2026

🦋 Changeset detected

Latest commit: 35d0aae

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Contributor

github-actions bot commented Mar 23, 2026

✅ All changesets look good

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 23, 2026

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@13016

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@13016

miniflare

npm i https://pkg.pr.new/miniflare@13016

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@13016

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@13016

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@13016

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@13016

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@13016

wrangler

npm i https://pkg.pr.new/wrangler@13016

commit: cb76ef9

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 23, 2026

I've completed my thorough review of the PR. Let me summarize my findings:

The PR is well-structured and addresses the stated issues clearly. The code changes are logical, the test coverage for the new proactive refresh behavior is solid, and the error recovery pattern (re-emitting reloadComplete on refresh failure) is sound.

After careful review, I don't find any actionable logic bugs, security issues, backward compatibility violations, or incorrect API behavior. The key areas I verified:

  1. Proactive refresh timer - properly cleared on bundleStart, teardown, and before rescheduling. No timer leak.
  2. Error recovery path - re-emitting #latestProxyData on failed refresh to unpause ProxyWorker is correct.
  3. HANDLER_RESERVED_KEYS - identical to the existing set in external-service.ts.
  4. checkForPreviewTokenError moved outside isHtmlResponse guard - correct, since error code: 1031 is text/plain. The function clones the response before reading, so the original body is preserved.
  5. Auth lazy resolution - passing the hook through instead of eagerly unwrapping is the right pattern for long-lived sessions.
  6. deepStrictEqual on auth - the preExistingRemoteProxySessionData doesn't include auth in the return type, so this comparison always sees undefined for the old value. But this is a pre-existing behavior, not introduced by this PR.
  7. Retry semantics - rescheduling proactive refresh in the catch block at the same 50-minute interval is reasonable.
  8. Test quality - tests properly use fake timers, cover the happy path, cancellation on bundleStart, cancellation on teardown, and the reactive refresh path.

LGTM

github run

@penalosa penalosa force-pushed the penalosa/remote-bindings-timeout branch 2 times, most recently from 6ab5cab to 35d0aae Compare March 23, 2026 17:07
@penalosa penalosa force-pushed the penalosa/remote-bindings-timeout branch from 35d0aae to cb76ef9 Compare March 23, 2026 17:32
@penalosa penalosa marked this pull request as ready for review March 23, 2026 17:32
@penalosa penalosa requested a review from a team as a code owner March 23, 2026 17:33
@workers-devprod
Copy link
Contributor

Codeowners approval required for this PR:

  • @cloudflare/wrangler
Show detailed file reviewers
  • packages/miniflare/src/workers/shared/remote-proxy-client.worker.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/tests/api/startDevWorker/RemoteRuntimeController.test.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/tests/dev/remote-bindings.test.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/api/remoteBindings/index.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/api/startDevWorker/LocalRuntimeController.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/api/startDevWorker/RemoteRuntimeController.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/api/startDevWorker/utils.ts: [@cloudflare/wrangler]
  • packages/wrangler/templates/startDevWorker/ProxyWorker.ts: [@cloudflare/wrangler]

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines 427 to 429
this.#session = await this.#getPreviewSession(
this.#latestConfig,
auth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ProxyWorker left permanently paused when #getPreviewSession silently returns undefined

When #refreshPreviewToken runs (from the proactive 50-minute timer or onPreviewTokenExpired), it first emits reloadStart which pauses the ProxyWorker (packages/wrangler/src/api/startDevWorker/ProxyController.ts:439 sends a "pause" message). Then #getPreviewSession is called at line 427. If it encounters a non-abort API error, #previewSession (RemoteRuntimeController.ts:95-109) catches the error internally, emits an error event, and returns undefined without re-throwing. This causes this.#session to become undefined, so #previewToken (RemoteRuntimeController.ts:119) returns undefined, and #updatePreviewToken (RemoteRuntimeController.ts:327) exits early without emitting reloadComplete. The catch block (lines 441-467) — which contains the recovery logic to re-emit stale reloadComplete and reschedule — is never entered. As a result: (1) the ProxyWorker stays permanently paused and all requests hang, (2) the misleading "✔ Preview token refreshed successfully" is logged, and (3) no future #scheduleProactiveRefresh() is called so recovery never happens automatically.

(Refers to lines 427-440)

Prompt for agents
In packages/wrangler/src/api/startDevWorker/RemoteRuntimeController.ts, in the #refreshPreviewToken method (around lines 427-440), after the call to this.#getPreviewSession(), add a check for whether this.#session is undefined. If it is, perform the same recovery as the catch block: re-emit the last known reloadComplete (to unpause the ProxyWorker) and call this.#scheduleProactiveRefresh() to schedule a future retry, then return early. Do NOT log the success message. For example, after line 431 add:

if (!this.#session) {
  if (this.#latestProxyData && this.#latestConfig && this.#latestBundle) {
    this.emitReloadCompleteEvent({
      type: "reloadComplete",
      config: this.#latestConfig,
      bundle: this.#latestBundle,
      proxyData: this.#latestProxyData,
    });
  }
  this.#scheduleProactiveRefresh();
  return;
}

Alternatively, consider having #previewSession re-throw the error after emitting the error event, so that the catch block in #refreshPreviewToken handles it naturally.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@penalosa penalosa marked this pull request as draft March 23, 2026 18:13
Copy link
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. I wonder if we could achieve a similar result by emitting the previewTokenExpired event after 50 mins and let the rest of the machinery do the refresh?

Comment on lines +44 to +46
if (typeof prop === "string" && HANDLER_RESERVED_KEYS.has(prop)) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?


// Store for token refresh
this.#latestConfig = config;
this.#latestConfig = config as RemoteStartDevWorkerOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cast be avoided with an assertion?

Suggested change
this.#latestConfig = config as RemoteStartDevWorkerOptions;
assert(config.dev.auth);
this.#latestConfig = config;

});
}

this.#scheduleProactiveRefresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the refresh failed, don't we want to hard error and bail out? If not, then re-scheduling a check using the normal timings (50mins?) seems too long. If it failed we should probably retry sooner?

Comment on lines +454 to +455
// Re-emit the last known reloadComplete so the ProxyWorker is unpaused.
// Without this, a failed refresh leaves the ProxyWorker permanently paused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider unpausing the proxy worker when the error event is emitted rather than triggering a reload complete event? It kind of feels disingenuous to trigger a "complete" event when there was an error.

export function assertNever(_value: never) {}

/**
* Preview tokens expire after 1 hour (hardcoded in the Workers control plane).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Preview tokens expire after 1 hour (hardcoded in the Workers control plane).
* When to proactively refresh the preview token.
*
* Preview tokens expire after 1 hour (hardcoded in the Workers control plane), so we retry after 50 mins.

Comment on lines +724 to +727
const resolvedAuth1 =
typeof authHook1 === "function"
? await (authHook1 as unknown as () => Promise<unknown>)()
: authHook1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not use unwrapHook() here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Untriaged

Development

Successfully merging this pull request may close these issues.

Wrangler should detect when remote bindings have timed out

3 participants