Skip to content

fix(ci): make redis_setexz a safe no-op when redis is unavailable#24520

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-redis-setexz-sigpipe-v2
Draft

fix(ci): make redis_setexz a safe no-op when redis is unavailable#24520
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-redis-setexz-sigpipe-v2

Conversation

@AztecBot

@AztecBot AztecBot commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Problem

The nightly barretenberg debug build failed immediately, before it could even request an EC2 instance (failing run):

--- Run barretenberg-debug CI ---

gzip: stdout: Broken pipe
##[error]Process completed with exit code 1.

Root cause

ci3/bootstrap_ec2 writes an initial log entry to redis at the very start of a run:

echo "CI booting..." | redis_setexz "$CI_LOG_ID" 300

redis_setexz was:

function redis_setexz {
  gzip | redis_cli -x SETEX $1 $2 &>/dev/null
}

When redis is unavailable, redis_cli (in ci3/source_redis) is a no-op that returns without reading its stdin. gzip then writes into a pipe whose read end is already closed and dies with SIGPIPE / "Broken pipe". Because ci3 runs under set -euo pipefail, that broken-pipe failure propagates out of the pipeline and kills the whole script before any instance is launched.

This is the direct cause of the observed failure: the run happened on the aztec-claude mirror, whose environment has no AWS creds and no BUILD_INSTANCE_SSH_KEY (all empty in the job env), so the redis tunnel is never opened and CI_REDIS_AVAILABLE=0. source_redis intentionally degrades gracefully in that case ("Log and test cache will be disabled") — but the very next redis_setexz call defeats that graceful degradation.

The same latent bug would take down the real nightly (or any CI run) any time the redis tunnel fails to establish, since bootstrap_ec2:24, cache_log's publish_log, and run_test_cmd all call redis_setexz unguarded, relying on it being a safe no-op. (denoise already works around it by guarding on CI_REDIS_AVAILABLE before calling it.)

Fix

Make redis_setexz itself a proper no-op when redis is unavailable, mirroring the existing redis_cli guard, and still drain stdin so the upstream pipeline producer doesn't get SIGPIPE:

function redis_setexz {
  if [ "$CI_REDIS_AVAILABLE" -ne 1 ]; then
    cat >/dev/null
    return 0
  fi
  gzip | redis_cli -x SETEX $1 $2 &>/dev/null
}

This fixes all unguarded callers at once and keeps the redis-available path unchanged.

Verification

Reproduced the failure under set -euo pipefail with CI_REDIS_AVAILABLE=0:

  • Before: echo "CI booting..." | redis_setexz k 300 → pipeline dies with exit 141 (SIGPIPE) / "gzip: stdout: Broken pipe".
  • After: same call returns 0 and the script continues; the redis-available path (stub redis_cli consuming stdin) also returns 0.

Notes

The scheduled nightly is guarded to only run on AztecProtocol/aztec-packages (barretenberg-nightly-debug-build.yml line 14); the failing run came from the aztec-claude mirror's older copy of that workflow that predates the guard, which is why it ran there at all. That mirror will self-correct on its next upstream sync — this PR fixes the underlying robustness bug that the mirror's credential-less environment exposed.


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant