fix(core): retry Chromium download with idle timeout (PER-9192)#2271
Open
pranavz28 wants to merge 1 commit into
Open
fix(core): retry Chromium download with idle timeout (PER-9192)#2271pranavz28 wants to merge 1 commit into
pranavz28 wants to merge 1 commit into
Conversation
The Chromium downloader performed a single https.get piped to a file with no socket timeout, no retry, and no resume. A transient network stall in CI (data stops arriving but the socket is never reset) left the download frozen indefinitely — the classic "stuck at 93%" symptom — until the CI job itself timed out, failing the build with zero snapshots. Extract the per-attempt download into fetchToFile() with an idle socket timeout (request.setTimeout -> destroy) so a stalled connection fails fast, and wrap it in a retry loop that removes the partial archive between attempts. Both are env-tunable via PERCY_CHROMIUM_DOWNLOAD_TIMEOUT (default 30s) and PERCY_CHROMIUM_DOWNLOAD_RETRIES (default 3). Adds setTimeout/destroy to the shared MockRequest test helper and tests covering retry-then-succeed and retry-exhaustion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes PER-9192
Problem
Customers' CI builds intermittently fail with the Chromium download frozen near the end (e.g.
Downloading Chromium 1300309 [=========== ] 167.3MB/178.1MB 93%). The build is created but uploads zero snapshots and is later reaped asmanually_failed_build. It works locally because Chromium is already cached there; each CI runner starts cold and re-downloads the full ~178 MB, exposing it to network flakiness.The Chromium artifact itself is fine on Google's bucket (HTTP 200, correct content-length, Range/resume supported) — the failure is a transient interruption of the connection that the CLI does not recover from.
Root cause
packages/core/src/install.js—download()performed a singlehttps.getpiped straight to a file:Fix
fetchToFile()with an idle socket timeout (request.setTimeout→request.destroy) so a stalled connection fails fast instead of hanging.PERCY_CHROMIUM_DOWNLOAD_TIMEOUT(default 30000 ms) andPERCY_CHROMIUM_DOWNLOAD_RETRIES(default 3).This converts an indefinite hang into bounded, self-healing attempts. Behaviour is unchanged on the happy path and for permanent failures (e.g. a 404 still ultimately rejects, after exhausting retries).
Testing
packages/client/test/helpers.js: taught the sharedMockRequesthelper to modelsetTimeout/destroy.packages/core/test/unit/install.test.js: addedretries the download when an attempt failsandgives up after exhausting download retries.download()path (handles failed downloads, the two new ones) pass.Note for reviewers
A momentary network glitch in CI currently kills the whole build. After this change it retries and recovers. The recommended customer-side mitigation (pre-installing a browser +
PERCY_BROWSER_EXECUTABLE) still applies for fully air-gapped runners; this just makes the default path resilient.🤖 Generated with Claude Code