ci: cache gnu32 nix store#256
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste |
915a7f8 to
7aa3271
Compare
|
@ryanofsky can you whitelist |
Thanks, added. This sorry for the delay. If there is a way we could give you github permissions to change these settings that would seem nice too |
|
There's no rush and I don't expect to have to make many such changes. Basic ccache and more tailored nix cache should do the trick. |
3fa8a2c to
34da50f
Compare
|
Looks like the cache got trimmed the first write. Trying again. This push to 34da50f is not expected to be substantially faster, the next push should be. |
|
Depsite the incomplete cache it ran in only 30 minutes. The new cache entry is 1.7 GB. Let's run it again... |
34da50f to
6dc9700
Compare
.github/workflows/ci.yml
Outdated
| if: matrix.config == 'gnu32' | ||
| uses: nix-community/cache-nix-action@v7 | ||
| with: | ||
| primary-key: nix-${{ runner.os }}-${{ matrix.config }}-${{ env.NIXPKGS_CHANNEL }}-${{ hashFiles('shell.nix', 'ci/patches/*.patch', 'ci/configs/gnu32.bash') }}-${{ github.sha }} |
There was a problem hiding this comment.
It might be better to drop -${{ github.sha }} so our cache won't get flushed as often.
|
Down to 3 minutes, nice! |
6dc9700 to
e734375
Compare
|
Trying a different caching approach, using the latest nix channel commit - which doesn't change that often for If it can't get an exact match it will try a broader match, so IIUC even if something in nix changes from under us, we might still get some benefit out of earlier cache entries. Once the build succeeds, it produces a fresh one that's then used in subsequent runs. |
7664d7c to
76d3e51
Compare
76d3e51 to
6562d22
Compare
|
As expected, the last gnu32 CI run found a match for the primary key and did not re-save it. https://github.com/bitcoin-core/libmultiprocess/actions/runs/23025035895/job/66870747483?pr=256 Runtime slightly under 3 minutes. So this should be good to go now. |
.github/workflows/ci.yml
Outdated
|
|
||
| # Use an explicit save step instead of the action post-step so we only | ||
| # archive the store after the build succeeded and the shell closure is | ||
| # rooted against the save-time garbage collection pass. |
There was a problem hiding this comment.
I have no idea what "shell closure is rooted against the save-time garbage collection pass" means, so it would be good for someone who knows Nix to sanity check this.
There was a problem hiding this comment.
I have no idea what "shell closure is rooted against the save-time garbage collection pass" means, so it would be good for someone who knows Nix to sanity check this.
It makes sense, but is a confusing comment. Would suggest splitting it up like "Use an explicit save step instead of the action post-step so we only save after a successful build. Create a GC root for the gnu32 shell closure here, so the cache-nix-action step below does not garbage-collect it. "
There was a problem hiding this comment.
Code review ACK 6562d22. Nice to speed up this very slow job.
Am happy to merge as-is but it would be nice not to hardcode this logic for the gnu32 job, also to not have such complicated bash commands in the yaml. More ideally, i think this would do something like:
- Add CI_CACHE_STORE=true in ci/configs/gnu32.bash
- Add a ci/scripts/config.sh script and build step that sources the config and outputs whatever variables the yml file needs to do caching
- Add command to ci/scripts/run.sh to create the .nix-gc-roots link needed to prevent garbage collection.
Also from https://github.com/nix-community/cache-nix-action/#a-typical-job it seems we don't need two separate cache-nix-action steps before and after the build. Just a single step before the build should be sufficient if the goal is to only save the cache on success.
.github/workflows/ci.yml
Outdated
|
|
||
| # Use an explicit save step instead of the action post-step so we only | ||
| # archive the store after the build succeeded and the shell closure is | ||
| # rooted against the save-time garbage collection pass. |
There was a problem hiding this comment.
I have no idea what "shell closure is rooted against the save-time garbage collection pass" means, so it would be good for someone who knows Nix to sanity check this.
It makes sense, but is a confusing comment. Would suggest splitting it up like "Use an explicit save step instead of the action post-step so we only save after a successful build. Create a GC root for the gnu32 shell closure here, so the cache-nix-action step below does not garbage-collect it. "
I you merge #253 first I'll rebase this to take advantage of the bash scripts introduced there. |
6562d22 to
0181f8b
Compare
|
TSan tripped over: Probably need to bump the If the gnu32 job works (so far so good), I'll push a comment improvement to see if it picks up the cache. |
0181f8b to
293b0e0
Compare
|
From one hour to two minutes, nice. Asan failure looks like another timeout again, see #263. Though the tail of the error looks more worrying: Or is that in the Python framework? https://github.com/bitcoin-core/libmultiprocess/actions/runs/23559019643/job/68593574084?pr=256 |
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK 293b0e0. This seems more understandable now with one cache step instead of two and more code moved to bash scripts. Left some comments and suggestions, but nothing blocking.
re: #256 (comment)
The capnp::CallContextHook "object has invalid vptr" error is pretty bad looking, but it looks like that happens after a test timeout so #263 should prevent it from happening. The actual bug causing that error might also be fixed by one of the fixes in #249, but that's speculative.
.github/workflows/ci.yml
Outdated
| # Cache the heaviest Nix job to stay within GitHub's cache budget while | ||
| # still avoiding repeated gnu32 cross-toolchain downloads and builds. |
There was a problem hiding this comment.
In commit "ci: cache gnu32 nix store" (293b0e0)
Probably should just drop this comment since there is no longer any code here referencing gnu32.
There was a problem hiding this comment.
I'll move it to the gnu32 script to explain why the other scripts don't.
ci/scripts/run.sh
Outdated
| nix-build shell.nix \ | ||
| -o .nix-gc-roots/shell \ | ||
| "${NIX_ARGS[@]+"${NIX_ARGS[@]}"}" | ||
| nix-store --query --requisites .nix-gc-roots/shell >/dev/null |
There was a problem hiding this comment.
In commit "ci: cache gnu32 nix store" (293b0e0)
Maybe add a comment about why this is here. It seems like it doesn't do anything by default but could print helpful errors if the closure is not complete? Not sure. Would be good to explain or remove.
There was a problem hiding this comment.
Added a comment, it's a guard against creating a bad cache.
Good results are ignored, and IIUC if something goes wrong it will go to stderr, but I didn't test that.
09f10e5 ci: bump timeout factor to 40 (Sjors Provoost) Pull request description: Matches the default `TEST_RUNNER_TIMEOUT_FACTOR` on Bitcoin Core CI. Should prevent false positives like #256 (comment) That test relies on a timeout: ```python disconnected_log_check.enter_context(node.assert_debug_log(expected_msgs=["IPC server: socket disconnected", "canceled while executing"], timeout=2)) ``` ACKs for top commit: ryanofsky: Code review ACK 09f10e5. Obviously good to match bitcoin core timeouts when running bitcoin core tests. Tree-SHA512: 48bbbc74813f7ec557a171fef43a381912dce5381c3e568a6dfe1e27fc1d7bdd66376fe9a48c59de9fdb2f1262fba446d9d47b5de7091632d51400f9aae19ae0
293b0e0 to
be86228
Compare
|
Rebased and addressed comments. The script change invalidated the cache primary key, but the fallback worked, so it's still fast. This fallback mechanism is also why the sanity check before storing cache is important: #256 (comment) |
|
CI failure: |
|
Another 143 exit, maybe they're running out of RAM with the longer timeouts? |
Yes these seem to happen reliably on master since #263 was merged. I suspect what is happening is that with the longer timeouts, the test waits forever for EDIT: This explanation is not right. After rebasing #249, we see SIGTERM/143 errors after the test has only run for 6 minutes (https://github.com/bitcoin-core/libmultiprocess/actions/runs/23587830246/job/68685257218?pr=249). So it seems like SIGTERM is not sent because the test is hanging for a very long time. More discussion about this is in #263 (comment) |
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK be86228. Thanks for the updates! Just suggested changes updating comments and tweaking CI_CACHE_NIX_STORE evaluation since last review.
I think I will go ahead and merge this since the Bitcoin Core CI failures should be unrelated, and are also happening in master
…bf18 ff0eed1bf18 refactor: Use loop variable in type-context.h ff1d8ba172a refactor: Move type-context.h getParams() call closer to use 1dbc59a4aa3 race fix: m_on_cancel called after request finishes 1643d05ba07 test: m_on_cancel called after request finishes f5509a31fcc race fix: getParams() called after request cancel 4a60c39f24a test: getParams() called after request cancel f11ec29ed20 race fix: worker thread destroyed before it is initialized a1d643348f4 test: worker thread destroyed before it is initialized e606fd84a8c Merge bitcoin-core/libmultiprocess#264: ci: reduce nproc multipliers 336023382c4 ci: reduce nproc multipliers b090beb9651 Merge bitcoin-core/libmultiprocess#256: ci: cache gnu32 nix store be8622816da ci: cache gnu32 nix store 975270b619c Merge bitcoin-core/libmultiprocess#263: ci: bump timeout factor to 40 09f10e5a598 ci: bump timeout factor to 40 db8f76ad290 Merge bitcoin-core/libmultiprocess#253: ci: run some Bitcoin Core CI jobs 55a9b557b19 ci: set Bitcoin Core CI test repetition fb0fc84d556 ci: add TSan job with instrumented libc++ 0f29c38725b ci: add Bitcoin Core IPC tests (ASan + macOS) 3f64320315d Merge bitcoin-core/libmultiprocess#262: ci: enable clang-tidy in macOS job, use nullptr cd9f8bdc9f0 Merge bitcoin-core/libmultiprocess#258: log: add socket connected info message and demote destroy logs to debug b5d6258a42f Merge bitcoin-core/libmultiprocess#255: fix: use unsigned char cast and sizeof in LogEscape escape sequence d94688e2c32 Merge bitcoin-core/libmultiprocess#251: Improved CustomBuildField for std::optional in IPC/libmultiprocess a9499fad755 mp: use nullptr with pthread_threadid_np f499e37850f ci: enable clang-tidy in macOS job 98f1352159d log: add socket connected info message and demote destroy logs to debug 554a481ea73 fix: use unsigned char cast and sizeof in LogEscape escape sequence 1977b9f3f65 Use std::forward in CustomBuildField for std::optional to allow move semantics, resolves FIXME 22bec918c97 Merge bitcoin-core/libmultiprocess#247: type-map: Work around LLVM 22 "out of bounds index" error 8a5e3ae6ed2 Merge bitcoin-core/libmultiprocess#242: proxy-types: add CustomHasField hook to map Cap'n Proto values to null C++ values e8d35246918 Merge bitcoin-core/libmultiprocess#246: doc: Bump version 8 > 9 97d877053b6 proxy-types: add CustomHasField hook for nullable decode paths 8c2f10252c9 refactor: add missing includes to mp/type-data.h b1638aceb40 doc: Bump version 8 > 9 f61af487217 type-map: Work around LLVM 22 "out of bounds index" error git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: ff0eed1bf183627c89007e71d631f039bd61bfb5
…bf18 ff0eed1bf18 refactor: Use loop variable in type-context.h ff1d8ba172a refactor: Move type-context.h getParams() call closer to use 1dbc59a4aa3 race fix: m_on_cancel called after request finishes 1643d05ba07 test: m_on_cancel called after request finishes f5509a31fcc race fix: getParams() called after request cancel 4a60c39f24a test: getParams() called after request cancel f11ec29ed20 race fix: worker thread destroyed before it is initialized a1d643348f4 test: worker thread destroyed before it is initialized e606fd84a8c Merge bitcoin-core/libmultiprocess#264: ci: reduce nproc multipliers 336023382c4 ci: reduce nproc multipliers b090beb9651 Merge bitcoin-core/libmultiprocess#256: ci: cache gnu32 nix store be8622816da ci: cache gnu32 nix store 975270b619c Merge bitcoin-core/libmultiprocess#263: ci: bump timeout factor to 40 09f10e5a598 ci: bump timeout factor to 40 db8f76ad290 Merge bitcoin-core/libmultiprocess#253: ci: run some Bitcoin Core CI jobs 55a9b557b19 ci: set Bitcoin Core CI test repetition fb0fc84d556 ci: add TSan job with instrumented libc++ 0f29c38725b ci: add Bitcoin Core IPC tests (ASan + macOS) 3f64320315d Merge bitcoin-core/libmultiprocess#262: ci: enable clang-tidy in macOS job, use nullptr cd9f8bdc9f0 Merge bitcoin-core/libmultiprocess#258: log: add socket connected info message and demote destroy logs to debug b5d6258a42f Merge bitcoin-core/libmultiprocess#255: fix: use unsigned char cast and sizeof in LogEscape escape sequence d94688e2c32 Merge bitcoin-core/libmultiprocess#251: Improved CustomBuildField for std::optional in IPC/libmultiprocess a9499fad755 mp: use nullptr with pthread_threadid_np f499e37850f ci: enable clang-tidy in macOS job 98f1352159d log: add socket connected info message and demote destroy logs to debug 554a481ea73 fix: use unsigned char cast and sizeof in LogEscape escape sequence 1977b9f3f65 Use std::forward in CustomBuildField for std::optional to allow move semantics, resolves FIXME 22bec918c97 Merge bitcoin-core/libmultiprocess#247: type-map: Work around LLVM 22 "out of bounds index" error 8a5e3ae6ed2 Merge bitcoin-core/libmultiprocess#242: proxy-types: add CustomHasField hook to map Cap'n Proto values to null C++ values e8d35246918 Merge bitcoin-core/libmultiprocess#246: doc: Bump version 8 > 9 97d877053b6 proxy-types: add CustomHasField hook for nullable decode paths 8c2f10252c9 refactor: add missing includes to mp/type-data.h b1638aceb40 doc: Bump version 8 > 9 f61af487217 type-map: Work around LLVM 22 "out of bounds index" error git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: ff0eed1bf183627c89007e71d631f039bd61bfb5
…da8f 70f632bda8f Merge bitcoin-core/libmultiprocess#265: ci: set LC_ALL in shell scripts 8e8e564259a Merge bitcoin-core/libmultiprocess#249: fixes for race conditions on disconnects 05d34cc2ec3 ci: set LC_ALL in shell scripts e606fd84a8c Merge bitcoin-core/libmultiprocess#264: ci: reduce nproc multipliers ff0eed1bf18 refactor: Use loop variable in type-context.h ff1d8ba172a refactor: Move type-context.h getParams() call closer to use 1dbc59a4aa3 race fix: m_on_cancel called after request finishes 1643d05ba07 test: m_on_cancel called after request finishes f5509a31fcc race fix: getParams() called after request cancel 4a60c39f24a test: getParams() called after request cancel f11ec29ed20 race fix: worker thread destroyed before it is initialized a1d643348f4 test: worker thread destroyed before it is initialized 336023382c4 ci: reduce nproc multipliers b090beb9651 Merge bitcoin-core/libmultiprocess#256: ci: cache gnu32 nix store be8622816da ci: cache gnu32 nix store 975270b619c Merge bitcoin-core/libmultiprocess#263: ci: bump timeout factor to 40 09f10e5a598 ci: bump timeout factor to 40 db8f76ad290 Merge bitcoin-core/libmultiprocess#253: ci: run some Bitcoin Core CI jobs 55a9b557b19 ci: set Bitcoin Core CI test repetition fb0fc84d556 ci: add TSan job with instrumented libc++ 0f29c38725b ci: add Bitcoin Core IPC tests (ASan + macOS) 3f64320315d Merge bitcoin-core/libmultiprocess#262: ci: enable clang-tidy in macOS job, use nullptr cd9f8bdc9f0 Merge bitcoin-core/libmultiprocess#258: log: add socket connected info message and demote destroy logs to debug b5d6258a42f Merge bitcoin-core/libmultiprocess#255: fix: use unsigned char cast and sizeof in LogEscape escape sequence d94688e2c32 Merge bitcoin-core/libmultiprocess#251: Improved CustomBuildField for std::optional in IPC/libmultiprocess a9499fad755 mp: use nullptr with pthread_threadid_np f499e37850f ci: enable clang-tidy in macOS job 98f1352159d log: add socket connected info message and demote destroy logs to debug 554a481ea73 fix: use unsigned char cast and sizeof in LogEscape escape sequence 1977b9f3f65 Use std::forward in CustomBuildField for std::optional to allow move semantics, resolves FIXME 22bec918c97 Merge bitcoin-core/libmultiprocess#247: type-map: Work around LLVM 22 "out of bounds index" error 8a5e3ae6ed2 Merge bitcoin-core/libmultiprocess#242: proxy-types: add CustomHasField hook to map Cap'n Proto values to null C++ values e8d35246918 Merge bitcoin-core/libmultiprocess#246: doc: Bump version 8 > 9 97d877053b6 proxy-types: add CustomHasField hook for nullable decode paths 8c2f10252c9 refactor: add missing includes to mp/type-data.h b1638aceb40 doc: Bump version 8 > 9 f61af487217 type-map: Work around LLVM 22 "out of bounds index" error git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: 70f632bda8f80449b6240f98da768206a535a04e
…da8f 70f632bda8f Merge bitcoin-core/libmultiprocess#265: ci: set LC_ALL in shell scripts 8e8e564259a Merge bitcoin-core/libmultiprocess#249: fixes for race conditions on disconnects 05d34cc2ec3 ci: set LC_ALL in shell scripts e606fd84a8c Merge bitcoin-core/libmultiprocess#264: ci: reduce nproc multipliers ff0eed1bf18 refactor: Use loop variable in type-context.h ff1d8ba172a refactor: Move type-context.h getParams() call closer to use 1dbc59a4aa3 race fix: m_on_cancel called after request finishes 1643d05ba07 test: m_on_cancel called after request finishes f5509a31fcc race fix: getParams() called after request cancel 4a60c39f24a test: getParams() called after request cancel f11ec29ed20 race fix: worker thread destroyed before it is initialized a1d643348f4 test: worker thread destroyed before it is initialized 336023382c4 ci: reduce nproc multipliers b090beb9651 Merge bitcoin-core/libmultiprocess#256: ci: cache gnu32 nix store be8622816da ci: cache gnu32 nix store 975270b619c Merge bitcoin-core/libmultiprocess#263: ci: bump timeout factor to 40 09f10e5a598 ci: bump timeout factor to 40 db8f76ad290 Merge bitcoin-core/libmultiprocess#253: ci: run some Bitcoin Core CI jobs 55a9b557b19 ci: set Bitcoin Core CI test repetition fb0fc84d556 ci: add TSan job with instrumented libc++ 0f29c38725b ci: add Bitcoin Core IPC tests (ASan + macOS) 3f64320315d Merge bitcoin-core/libmultiprocess#262: ci: enable clang-tidy in macOS job, use nullptr cd9f8bdc9f0 Merge bitcoin-core/libmultiprocess#258: log: add socket connected info message and demote destroy logs to debug b5d6258a42f Merge bitcoin-core/libmultiprocess#255: fix: use unsigned char cast and sizeof in LogEscape escape sequence d94688e2c32 Merge bitcoin-core/libmultiprocess#251: Improved CustomBuildField for std::optional in IPC/libmultiprocess a9499fad755 mp: use nullptr with pthread_threadid_np f499e37850f ci: enable clang-tidy in macOS job 98f1352159d log: add socket connected info message and demote destroy logs to debug 554a481ea73 fix: use unsigned char cast and sizeof in LogEscape escape sequence 1977b9f3f65 Use std::forward in CustomBuildField for std::optional to allow move semantics, resolves FIXME 22bec918c97 Merge bitcoin-core/libmultiprocess#247: type-map: Work around LLVM 22 "out of bounds index" error 8a5e3ae6ed2 Merge bitcoin-core/libmultiprocess#242: proxy-types: add CustomHasField hook to map Cap'n Proto values to null C++ values e8d35246918 Merge bitcoin-core/libmultiprocess#246: doc: Bump version 8 > 9 97d877053b6 proxy-types: add CustomHasField hook for nullable decode paths 8c2f10252c9 refactor: add missing includes to mp/type-data.h b1638aceb40 doc: Bump version 8 > 9 f61af487217 type-map: Work around LLVM 22 "out of bounds index" error git-subtree-dir: src/ipc/libmultiprocess git-subtree-split: 70f632bda8f80449b6240f98da768206a535a04e
…n disconnects 2478a15 Squashed 'src/ipc/libmultiprocess/' changes from 1868a84451f..70f632bda8f (Ryan Ofsky) Pull request description: Includes: - bitcoin-core/libmultiprocess#246 - bitcoin-core/libmultiprocess#242 - bitcoin-core/libmultiprocess#247 - bitcoin-core/libmultiprocess#251 - bitcoin-core/libmultiprocess#255 - bitcoin-core/libmultiprocess#258 - bitcoin-core/libmultiprocess#262 - bitcoin-core/libmultiprocess#253 - bitcoin-core/libmultiprocess#263 - bitcoin-core/libmultiprocess#256 - bitcoin-core/libmultiprocess#264 - bitcoin-core/libmultiprocess#249 - bitcoin-core/libmultiprocess#265 The main change is bitcoin-core/libmultiprocess#249 which fixes 3 intermittent race conditions detected in bitcoin core CI and antithesis: #34711/#34756, #34777, and #34782. The changes can be verified by running `test/lint/git-subtree-check.sh src/ipc/libmultiprocess` as described in [developer notes](https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#subtrees) and [lint instructions](https://github.com/bitcoin/bitcoin/tree/master/test/lint#git-subtree-checksh) ACKs for top commit: Sjors: ACK 613a548 ismaelsadeeq: ACK 613a548 Tree-SHA512: d99eebc8b4f45b3c3099298167362cf5e7f3e9e622eef9f17af56388ee5207d77a04b915b2a5a894493e0395aeda70111216f2da0d2a6553f4f6396b3d31a744

gnu32 is by far the slowest ci job and it spends most of it's time building Nix stuff.
Caching the Nix store drops subsequent runs to just 3 minutes.
Not caching the other ones, because the cache is quite large and Github limits us to 10GB total.
Using https://github.com/nix-community/cache-nix-action
Closes #254