-
-
Notifications
You must be signed in to change notification settings - Fork 5
feat(incident): Add incident report 3/8/2026 #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
422f00c
feat(incident): Add incident report for macOS installer version mismatch
avivkeller 179e14f
Update incident report date to 2026-03-08
avivkeller 099a14f
Correct timeline dates for Node.js v22.22.1 incident
avivkeller a3d948e
Update incidents/2026-03-08.md
avivkeller 488f523
Update 2026-03-08.md
avivkeller ca0945f
Update 2026-03-08.md
avivkeller 9c266ef
Update 2026-03-08.md
avivkeller b86b566
Update 2026-03-08.md
avivkeller e10ed51
Rename 2026-03-08.md to 2026-03-04.md
avivkeller 370cb5c
Update 2026-03-04.md
avivkeller b51c4c2
Apply suggestions from code review
avivkeller 76099bf
Update incident report for March 2026
avivkeller 82b7e0f
Update 2026-03-04.md
avivkeller 95682bd
Update 2026-03-04.md
avivkeller 1a0db65
Update 2026-03-04.md
avivkeller 7bcd57a
Apply suggestions from code review
avivkeller 008249d
Update incidents/2026-03-04.md
avivkeller 1c07d92
Update wording on time period
avivkeller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # 2026-03-08 Incident Report | ||
|
|
||
| - Incident Commander: @ryanaslett | ||
| - Severity Level: P1 | ||
|
|
||
| For a brief period of time, the macOS installer package (`.pkg`) for Node.js v22.22.1 served a duplicate file with a mismatched SHA256 checksum due to a failed rclone upload step during a Jenkins job re-run. While having a different hash, this file has been generated and signed legitimately by Node.js' CI and was safe to run. | ||
|
|
||
| ## Timeline | ||
|
|
||
| - **2026-03-08 17:14 UTC**: Start of impact. First Jenkins build completed successfully, uploading `node-v22.22.1.pkg` (SHA256: `1fbe9cd7e9fdce6cf150bbe59cb97a426434f7fb217135d10124a62bfb697448`) to R2. | ||
|
|
||
| - **2026-03-08 21:00 UTC**: Second Jenkins build completed, uploading corrected `node-v22.22.1.pkg` (SHA256: `ac8cb570db59cb399be96978c194f6c4fc91ffcf11a197ebd5461083c0cf1dfd`) to direct.nodejs.org, but rclone step to R2 failed, leaving R2 (serving most users at `www.`) with the outdated file. | ||
|
MattIPv4 marked this conversation as resolved.
Outdated
|
||
|
|
||
| - **2026-03-08 10:04 UTC**: Initial report of incident [nodejs/release-cloudflare-worker#878](https://github.com/nodejs/release-cloudflare-worker/issues/878) created. | ||
|
|
||
| - **2026-03-08 12:12 UTC**: Initial report of incident [nodejs/release-cloudflare-worker#878](https://github.com/nodejs/release-cloudflare-worker/issues/878) acknowledged. | ||
|
|
||
| - **2026-03-08 11:52 UTC**: Initial report forwarded to [OpenJS Slack](https://openjs-foundation.slack.com/archives/C09EXEEHFKP/p1773013976217429), investigation began. | ||
|
|
||
| - **2026-03-09 00:33 UTC**: Team confirmed both files were legitimately signed by Apple at different times (17:14 and 21:00 UTC). | ||
|
|
||
| - **2026-03-09 00:41 UTC**: Root cause identified - Jenkins job re-run uploaded to www but failed to sync to R2, causing version mismatch. | ||
|
|
||
| - **2026-03-09 01:25 UTC**: Corrected macOS installer package (`.pkg`) promoted. | ||
|
|
||
| - **2026-03-09 01:29 UTC**: Cache purged. Impact resolved. | ||
|
|
||
| ## Impact | ||
|
|
||
| Users downloading the macOS installer package from `https://nodejs.org/dist/v22.22.1/node-v22.22.1.pkg` received a file whose SHA256 checksum (`1fbe9cd7e9fdce6cf150bbe59cb97a426434f7fb217135d10124a62bfb697448`) did not match the checksum published in [`SHASUMS256.txt`](https://nodejs.org/dist/latest-v22.x/SHASUMS256.txt) (`ac8cb570db59cb399be96978c194f6c4fc91ffcf11a197ebd5461083c0cf1dfd`). | ||
|
|
||
| Both files were legitimately signed by the Node.js Foundation Apple Developer account, but represented different build artifacts from separate Jenkins runs. The file served from direct.nodejs.org was correct, but Cloudflare R2 (serving most users via the release worker) contained the outdated version. | ||
|
|
||
| ## Root Cause | ||
|
|
||
| A workflow issue in the Jenkins release process allowed files to become out of sync between direct.nodejs.org (www) and the R2 bucket. | ||
|
|
||
| The release process works as follows: | ||
| 1. Jenkins builds the macOS package and signs it | ||
| 2. The package is copied to direct.nodejs.org via `scp` | ||
| 3. Jenkins SSHs into direct and uses `rclone` to copy the file from www to R2 dist-staging | ||
|
|
||
| During the v22.22.1 release: | ||
| 1. The first Jenkins job (17:14 UTC) completed successfully, uploading the initial signed package to both direct and R2 | ||
| 2. The job was re-run, producing a new signed package at 21:00 UTC | ||
| 3. The second run successfully copied the new package to direct | ||
| 4. The `rclone` step to R2 failed with `kex_exchange_identification: Connection closed by remote host` | ||
| 5. The Jenkins job marked the build as failed but did not roll back the direct upload | ||
|
|
||
| This left `direct.` with the correct file (matching SHASUMS256.txt) while R2 served the outdated file, creating a checksum mismatch for most users. | ||
|
|
||
| ## Fix | ||
|
|
||
| The immediate fix was to manually sync the correct file from direct.nodejs.org to the R2 dist-staging bucket using `rclone copyto`. | ||
|
|
||
| ## Follow-up Work | ||
|
|
||
| - Improve Jenkins workflow to prevent partial uploads when rclone fails | ||
| - Either roll back www uploads if R2 sync fails, or upload to both destinations atomically | ||
| - Add verification step to compare checksums between www and R2 before marking build as complete | ||
| - Add monitoring/alerting for checksum mismatches between distribution sources | ||
| - Investigate why the rclone SSH connection failed mid-release | ||
| - Consider adding checksum verification as part of the promotion workflow | ||
| - Add better logging/auditing for release builds to track which artifacts were uploaded where and when | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.