Skip to content

Use aws s3 cp instead of wget for benchmark index downloads#927

Draft
ofiryanai wants to merge 2 commits intomainfrom
fix/benchmark-s3-download
Draft

Use aws s3 cp instead of wget for benchmark index downloads#927
ofiryanai wants to merge 2 commits intomainfrom
fix/benchmark-s3-download

Conversation

@ofiryanai
Copy link
Copy Markdown
Collaborator

Problem

The nightly benchmark workflow intermittently fails on the "Download pre-generated indices" step due to a 20-minute timeout. The most recent failure (run 23817284791) shows wget downloading multi-GB index files from S3 over HTTPS at ~8.7 MB/s, which is too slow to finish all files within the timeout.

Root Cause

The bm_files.sh script uses wget over public HTTPS to download indices from dev.cto.redis.s3.amazonaws.com. Since the benchmark runners are EC2 instances in the same AWS account, this is unnecessarily slow — traffic goes out through the public internet instead of using the high-bandwidth internal S3 path.

Fix

  • bm_files.sh: Replace wget with aws s3 cp, which uses the internal S3 endpoint for same-region transfers (typically 10-25 Gbps). Also deduplicate URLs with sort -u in the benchmarks-all path to avoid redundant downloads.
  • benchmark-runner.yml: Add awscli to the apt install step and add a Configure AWS credentials step (using the existing secrets) before the download step.

This is a minimal, targeted change — the download logic, file paths, parallelism, and timeout remain the same.


Pull Request opened by Augment Code with guidance from the PR author

Replace wget with aws s3 cp for downloading pre-generated indices in
benchmarks. Since the runners are EC2 instances in AWS and the files are
hosted on S3, using the AWS CLI for same-region transfers is significantly
faster and more reliable, avoiding the timeout failures seen with wget
over HTTPS.

Changes:
- bm_files.sh: Replace wget calls with aws s3 cp via a download_s3 helper.
  Also deduplicate URLs with sort -u in the benchmarks-all path.
- benchmark-runner.yml: Add awscli to apt install, and add a Configure
  AWS credentials step before the download step.
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 5, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

The apt-installed awscli v1 conflicts with the newer botocore pulled in
by pip (via redisbench-admin), causing KeyError: 'opsworkscm' on every
aws s3 cp call.

Fix by installing AWS CLI v2 as a standalone binary (no Python dependency),
and add set -euo pipefail to bm_files.sh so download failures are caught
immediately instead of silently producing missing files.
@ofiryanai ofiryanai force-pushed the fix/benchmark-s3-download branch from 1c7f8f8 to 0e57c58 Compare April 5, 2026 11:10
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 5, 2026

❌ Jit Scanner failed - Our team is investigating

Jit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions.


💡 Need to bypass this check? Comment @sera bypass to override.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant