Use aws s3 cp instead of wget for benchmark index downloads#927
Draft
Use aws s3 cp instead of wget for benchmark index downloads#927
Conversation
Replace wget with aws s3 cp for downloading pre-generated indices in benchmarks. Since the runners are EC2 instances in AWS and the files are hosted on S3, using the AWS CLI for same-region transfers is significantly faster and more reliable, avoiding the timeout failures seen with wget over HTTPS. Changes: - bm_files.sh: Replace wget calls with aws s3 cp via a download_s3 helper. Also deduplicate URLs with sort -u in the benchmarks-all path. - benchmark-runner.yml: Add awscli to apt install, and add a Configure AWS credentials step before the download step.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
The apt-installed awscli v1 conflicts with the newer botocore pulled in by pip (via redisbench-admin), causing KeyError: 'opsworkscm' on every aws s3 cp call. Fix by installing AWS CLI v2 as a standalone binary (no Python dependency), and add set -euo pipefail to bm_files.sh so download failures are caught immediately instead of silently producing missing files.
1c7f8f8 to
0e57c58
Compare
❌ Jit Scanner failed - Our team is investigatingJit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions. 💡 Need to bypass this check? Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The nightly benchmark workflow intermittently fails on the "Download pre-generated indices" step due to a 20-minute timeout. The most recent failure (run 23817284791) shows
wgetdownloading multi-GB index files from S3 over HTTPS at ~8.7 MB/s, which is too slow to finish all files within the timeout.Root Cause
The
bm_files.shscript useswgetover public HTTPS to download indices fromdev.cto.redis.s3.amazonaws.com. Since the benchmark runners are EC2 instances in the same AWS account, this is unnecessarily slow — traffic goes out through the public internet instead of using the high-bandwidth internal S3 path.Fix
bm_files.sh: Replacewgetwithaws s3 cp, which uses the internal S3 endpoint for same-region transfers (typically 10-25 Gbps). Also deduplicate URLs withsort -uin thebenchmarks-allpath to avoid redundant downloads.benchmark-runner.yml: Addawsclito the apt install step and add a Configure AWS credentials step (using the existing secrets) before the download step.This is a minimal, targeted change — the download logic, file paths, parallelism, and timeout remain the same.
Pull Request opened by Augment Code with guidance from the PR author