-
-
Notifications
You must be signed in to change notification settings - Fork 6
feat(incident): Add incident report 3/8/2026 #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
422f00c
feat(incident): Add incident report for macOS installer version mismatch
avivkeller 179e14f
Update incident report date to 2026-03-08
avivkeller 099a14f
Correct timeline dates for Node.js v22.22.1 incident
avivkeller a3d948e
Update incidents/2026-03-08.md
avivkeller 488f523
Update 2026-03-08.md
avivkeller ca0945f
Update 2026-03-08.md
avivkeller 9c266ef
Update 2026-03-08.md
avivkeller b86b566
Update 2026-03-08.md
avivkeller e10ed51
Rename 2026-03-08.md to 2026-03-04.md
avivkeller 370cb5c
Update 2026-03-04.md
avivkeller b51c4c2
Apply suggestions from code review
avivkeller 76099bf
Update incident report for March 2026
avivkeller 82b7e0f
Update 2026-03-04.md
avivkeller 95682bd
Update 2026-03-04.md
avivkeller 1a0db65
Update 2026-03-04.md
avivkeller 7bcd57a
Apply suggestions from code review
avivkeller 008249d
Update incidents/2026-03-04.md
avivkeller 1c07d92
Update wording on time period
avivkeller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # 2026-03-04 Incident Report | ||
|
|
||
| - Incident Commander: @ryanaslett | ||
| - Severity Level: P1 | ||
|
|
||
| For several days following the release announcement, the macOS installer package (`.pkg`) for Node.js v22.22.1 served a duplicate file with a mismatched SHA256 checksum due to a failed rclone upload step during a Jenkins job re-run. While having a different hash, this file has been generated and signed legitimately by Node.js' CI and was safe to run. | ||
|
|
||
| ## Timeline | ||
|
|
||
| - **2026-03-04 17:14 UTC**: First Jenkins build completed successfully, uploading `node-v22.22.1.pkg` (SHA256: `1fbe9cd7e9fdce6cf150bbe59cb97a426434f7fb217135d10124a62bfb697448`) to direct (backup origin server) and the R2 dist-staging bucket. | ||
|
|
||
| - **2026-03-04 21:00 UTC**: Second Jenkins build completed, uploading recreated `node-v22.22.1.pkg` (SHA256: `ac8cb570db59cb399be96978c194f6c4fc91ffcf11a197ebd5461083c0cf1dfd`) to direct, but failing to write to the R2 dist-staging bucket. | ||
|
|
||
| - **2026-03-05 14:30 UTC**: Start of impact. Promotion script ran, copying the release assets from the R2 dist-staging bucket to the R2 dist-prod bucket (which serves `nodejs.org`), including `node-v22.22.1.pkg` from the first Jenkins run. `SHASUMS256.txt` generated based on assets on direct, including `node-v22.22.1.pkg` from the second Jenkins run. | ||
|
|
||
avivkeller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **2026-03-08 10:04 UTC**: Initial report of incident [nodejs/release-cloudflare-worker#878](https://github.com/nodejs/release-cloudflare-worker/issues/878) created. | ||
|
|
||
| - **2026-03-08 12:12 UTC**: Initial report of incident [nodejs/release-cloudflare-worker#878](https://github.com/nodejs/release-cloudflare-worker/issues/878) acknowledged. | ||
|
|
||
| - **2026-03-08 23:52 UTC**: Initial report forwarded to [OpenJS Slack](https://openjs-foundation.slack.com/archives/C09EXEEHFKP/p1773013976217429), investigation began. | ||
|
|
||
| - **2026-03-09 00:33 UTC**: Team confirmed both files were legitimately signed by Apple at different times (17:14 and 21:00 UTC). | ||
|
|
||
| - **2026-03-09 00:41 UTC**: Root cause identified - Jenkins job re-run uploaded to direct but failed to sync to R2, causing version mismatch. | ||
|
|
||
| - **2026-03-09 01:25 UTC**: Corrected macOS installer package (`.pkg`) promoted. Impact resolved shortly after. | ||
|
|
||
| ## Impact | ||
|
|
||
| Users downloading the macOS installer package from `https://nodejs.org/dist/v22.22.1/node-v22.22.1.pkg` received a file whose SHA256 checksum (`1fbe9cd7e9fdce6cf150bbe59cb97a426434f7fb217135d10124a62bfb697448`) did not match the checksum published in [`SHASUMS256.txt`](https://nodejs.org/dist/latest-v22.x/SHASUMS256.txt) (`ac8cb570db59cb399be96978c194f6c4fc91ffcf11a197ebd5461083c0cf1dfd`). | ||
|
|
||
| Both files were legitimately signed by the Node.js Foundation Apple Developer account, but represented different build artifacts from separate Jenkins runs. The file served from direct.nodejs.org was correct, but Cloudflare R2 (serving most users via the release worker) contained the outdated version. | ||
|
|
||
| ## Root Cause | ||
|
|
||
| A workflow issue in the Jenkins release process allowed files to become out of sync between direct.nodejs.org (www) and the R2 bucket. | ||
|
|
||
| The release process works as follows: | ||
| 1. Jenkins builds the macOS package and signs it | ||
| 2. The package is copied to direct via `scp` | ||
| 3. Jenkins SSHs into direct and uses `rclone` to copy the file to the R2 dist-staging bucket | ||
| 4. Releaser runs script which SSHs into direct and copies files from the R2 dist-staging bucket to the R2 dist-prod bucket | ||
| 5. Script generates `SHASUMS256.txt` based on files on direct, not R2, and writes this to the R2 dist-prod bucket | ||
|
|
||
| During the v22.22.1 release: | ||
| 1. The first Jenkins job (17:14 UTC) completed successfully, uploading the initial signed package to both direct and R2 staging | ||
| 2. The job was re-run, producing a new signed package at 21:00 UTC | ||
| 3. The second run successfully copied the new package to direct | ||
| 4. The `rclone` step to R2 staging failed with `kex_exchange_identification: Connection closed by remote host` | ||
| 5. The Jenkins job marked the build as failed but did not roll back the direct upload | ||
avivkeller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 6. Releaser ran script, which promoted the original package from R2 staging to prod, but generated `SHASUMS256.txt` based on the regenerated package on direct | ||
|
|
||
| This left direct with matching package and `SHASUMS256.txt` files, but the R2 prod bucket with the outdated package file, creating a checksum mismatch for most users. | ||
|
|
||
| ## Fix | ||
|
|
||
| The immediate fix was to manually sync the correct file from direct to the R2 dist-staging bucket using `rclone copyto`, and then to the R2 dist-prod bucket. | ||
|
|
||
| ## Follow-up Work | ||
|
|
||
| - Improve Jenkins workflow to prevent partial uploads when rclone fails | ||
| - Either roll back direct uploads if R2 sync fails, or upload to both destinations atomically | ||
| - Add verification step to compare checksums between direct and R2 before marking build as complete | ||
| - Add monitoring/alerting for checksum mismatches between distribution sources | ||
| - Investigate why the rclone SSH connection failed mid-release | ||
| - Consider adding checksum verification as part of the promotion workflow | ||
avivkeller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Generate checksums based on R2 dist-prod contents rather than direct | ||
| - Add better logging/auditing for release builds to track which artifacts were uploaded where and when | ||
avivkeller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Create or make known what documentation/sources of truth to point to for any further incidents like this | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.