Fix ENOENT on shared build steps from stale build-log path#208
Open
mtelvers wants to merge 1 commit into
Open
Conversation
A finished build log was parked in the Readonly state holding its in-progress path, but the store renames the build directory into place on success. A late-joining log tailer (e.g. a sibling build sharing a deduplicated step) reopening that path after the rename hit ENOENT (..., "open", ".../in-progress/<id>/log"). Retain a read-only dup of the fd in the finished state and read from it instead of reopening by path; the fd follows the file through the rename. On Windows the fd must be closed before the directory can be renamed, so it keeps reopening by path there. Add a regression test asserting a finished log stays readable after its directory is renamed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have been seeing intermittent build failures on busy workers:
When a build finishes,
Build_log.finishparks the log as`Readonly <path>so late-joining tailers can still read it, but<path>is the in-progress location, and the store renames the build directoryin-progress -> resulton success. A late-joining tailer is typically a sibling job sharing a build step, viadb_store'sSome existingbranch), that reopens that path after the rename getsENOENT.Tailers that attach while the log is still
`Openare fine as theydupthe live fd, which follows the inode through the rename. Only the reopen-by-path in the finished state is affected. This bug was introduced by me in 393a5ef as part of the Windows HCS work as that requiredfinishto close the fd before the rename (required on Windows) and reopen by path instead of relying on the still-open fd.The fix is to keep a read-only
dupof the fd and read tailers from it rather than reopening by path. The fd follows the file through the directory rename. On Windows the fd must be closed before the directory can be renamed, so it keeps reopening by path there.Furthermore, this adds a regression test (
build_log/ "Readable after rename") that finishes a log, renames its directory as the store does, then tails it and asserts the content is still readable. It reproduces the exactENOENTon the pre-fix code and passes with the fix.