[wrangler] fix: prevent remote binding sessions expiring during long dev sessions#13016
[wrangler] fix: prevent remote binding sessions expiring during long dev sessions#13016
Conversation
🦋 Changeset detectedLatest commit: 35d0aae The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
✅ All changesets look good |
create-cloudflare
@cloudflare/kv-asset-handler
miniflare
@cloudflare/pages-shared
@cloudflare/unenv-preset
@cloudflare/vite-plugin
@cloudflare/vitest-pool-workers
@cloudflare/workers-editor-shared
wrangler
commit: |
|
I've completed my thorough review of the PR. Let me summarize my findings: The PR is well-structured and addresses the stated issues clearly. The code changes are logical, the test coverage for the new proactive refresh behavior is solid, and the error recovery pattern (re-emitting After careful review, I don't find any actionable logic bugs, security issues, backward compatibility violations, or incorrect API behavior. The key areas I verified:
LGTM |
6ab5cab to
35d0aae
Compare
35d0aae to
cb76ef9
Compare
|
Codeowners approval required for this PR:
Show detailed file reviewers
|
| this.#session = await this.#getPreviewSession( | ||
| this.#latestConfig, | ||
| auth, |
There was a problem hiding this comment.
🔴 ProxyWorker left permanently paused when #getPreviewSession silently returns undefined
When #refreshPreviewToken runs (from the proactive 50-minute timer or onPreviewTokenExpired), it first emits reloadStart which pauses the ProxyWorker (packages/wrangler/src/api/startDevWorker/ProxyController.ts:439 sends a "pause" message). Then #getPreviewSession is called at line 427. If it encounters a non-abort API error, #previewSession (RemoteRuntimeController.ts:95-109) catches the error internally, emits an error event, and returns undefined without re-throwing. This causes this.#session to become undefined, so #previewToken (RemoteRuntimeController.ts:119) returns undefined, and #updatePreviewToken (RemoteRuntimeController.ts:327) exits early without emitting reloadComplete. The catch block (lines 441-467) — which contains the recovery logic to re-emit stale reloadComplete and reschedule — is never entered. As a result: (1) the ProxyWorker stays permanently paused and all requests hang, (2) the misleading "✔ Preview token refreshed successfully" is logged, and (3) no future #scheduleProactiveRefresh() is called so recovery never happens automatically.
(Refers to lines 427-440)
Prompt for agents
In packages/wrangler/src/api/startDevWorker/RemoteRuntimeController.ts, in the #refreshPreviewToken method (around lines 427-440), after the call to this.#getPreviewSession(), add a check for whether this.#session is undefined. If it is, perform the same recovery as the catch block: re-emit the last known reloadComplete (to unpause the ProxyWorker) and call this.#scheduleProactiveRefresh() to schedule a future retry, then return early. Do NOT log the success message. For example, after line 431 add:
if (!this.#session) {
if (this.#latestProxyData && this.#latestConfig && this.#latestBundle) {
this.emitReloadCompleteEvent({
type: "reloadComplete",
config: this.#latestConfig,
bundle: this.#latestBundle,
proxyData: this.#latestProxyData,
});
}
this.#scheduleProactiveRefresh();
return;
}
Alternatively, consider having #previewSession re-throw the error after emitting the error event, so that the catch block in #refreshPreviewToken handles it naturally.
Was this helpful? React with 👍 or 👎 to provide feedback.
petebacondarwin
left a comment
There was a problem hiding this comment.
Looking good so far. I wonder if we could achieve a similar result by emitting the previewTokenExpired event after 50 mins and let the rest of the machinery do the refresh?
| if (typeof prop === "string" && HANDLER_RESERVED_KEYS.has(prop)) { | ||
| return; | ||
| } |
|
|
||
| // Store for token refresh | ||
| this.#latestConfig = config; | ||
| this.#latestConfig = config as RemoteStartDevWorkerOptions; |
There was a problem hiding this comment.
Could this cast be avoided with an assertion?
| this.#latestConfig = config as RemoteStartDevWorkerOptions; | |
| assert(config.dev.auth); | |
| this.#latestConfig = config; |
| }); | ||
| } | ||
|
|
||
| this.#scheduleProactiveRefresh(); |
There was a problem hiding this comment.
If the refresh failed, don't we want to hard error and bail out? If not, then re-scheduling a check using the normal timings (50mins?) seems too long. If it failed we should probably retry sooner?
| // Re-emit the last known reloadComplete so the ProxyWorker is unpaused. | ||
| // Without this, a failed refresh leaves the ProxyWorker permanently paused |
There was a problem hiding this comment.
Should we consider unpausing the proxy worker when the error event is emitted rather than triggering a reload complete event? It kind of feels disingenuous to trigger a "complete" event when there was an error.
| export function assertNever(_value: never) {} | ||
|
|
||
| /** | ||
| * Preview tokens expire after 1 hour (hardcoded in the Workers control plane). |
There was a problem hiding this comment.
| * Preview tokens expire after 1 hour (hardcoded in the Workers control plane). | |
| * When to proactively refresh the preview token. | |
| * | |
| * Preview tokens expire after 1 hour (hardcoded in the Workers control plane), so we retry after 50 mins. |
| const resolvedAuth1 = | ||
| typeof authHook1 === "function" | ||
| ? await (authHook1 as unknown as () => Promise<unknown>)() | ||
| : authHook1; |
There was a problem hiding this comment.
Can you not use unwrapHook() here?
Fixes #12901
Fixes an issue where remote bindings (e.g. Workers AI) would stop working after ~1 hour in
wrangler dev, requiring a manual restart.Root causes fixed:
Preview tokens expired silently — tokens have a 1-hour TTL set by the Workers control plane. The ProxyWorker would detect expiry reactively (only after a request failed), meaning one request would always fail before a refresh was triggered. This PR adds a proactive refresh at 50 minutes so no request ever sees an expired token.
error code: 1031not recognised as expiry — bindings like Workers AI returntext/plainerror code: 1031when their session times out, but the existing check only matched the HTMLInvalid Workers Preview configurationresponse. The check is now content-type-agnostic.Failed refresh left ProxyWorker permanently paused — if
#refreshPreviewTokenthrew afteremitReloadStartEventhad already paused the ProxyWorker, noreloadCompletewas ever emitted to unpause it. The catch block now re-emits the last knownproxyDatato recover.Auth resolved too early —
unwrapHook(dev.auth)was called at bundle-complete time and the resolved value stored. This means OAuth tokens captured at startup could go stale. Auth is now resolved lazily each time a session starts.Handler reserved keys proxied to remote stub — workerd probes service binding entrypoints for lifecycle handler methods (
tail,trace,scheduled, etc.) by accessing those properties. Without a guard these fell through to the capnweb stub, returning truthyRpcPromiseobjects and making workerd think the binding handled those events. Added aHANDLER_RESERVED_KEYSguard matching the existing pattern inexternal-service.ts.