Skip to content

fix(storage): set Content-MD5 header for single-part uploads when md5 is configured#6485

Open
fmassot wants to merge 2 commits into
mainfrom
fmassot/fix-single-part-md5-content-md5-header
Open

fix(storage): set Content-MD5 header for single-part uploads when md5 is configured#6485
fmassot wants to merge 2 commits into
mainfrom
fmassot/fix-single-part-md5-content-md5-header

Conversation

@fmassot
Copy link
Copy Markdown
Collaborator

@fmassot fmassot commented Jun 2, 2026

Problem

When checksum_algorithm: md5 is configured, aws_checksum_algorithm() returns None (the AWS SDK silently no-ops ChecksumAlgorithm::Md5). MD5 validation must therefore go through the legacy Content-MD5 header instead.

The multipart path already handles this correctly: maybe_compute_part_md5 computes a digest per part and upload_part sets Content-MD5 on each request. However, the single-part path (put_single_part_single_try) never set the header, so objects below the multipart threshold were uploaded without the requested MD5 validation even though the config/docs say MD5 is sent via Content-MD5.

Note that Content-MD5 is optional in the S3 API. By not passing it, it means that we can miss some corruption in transit for single-part uploads. Which is not the default behavior we want.

Fix

Call maybe_compute_part_md5(payload.as_ref(), 0..len) before streaming the body and set .set_content_md5(content_md5) on the put_object request — exactly mirroring what the multipart path does per part.

When the algorithm is not MD5, maybe_compute_part_md5 short-circuits immediately with None (no I/O), so there is no overhead for non-MD5 configs.

Testing

  • cargo check -p quickwit-storage passes.
  • indexed and search data on a local instance with index data on s3.
  • added some tests. A bit verbose to be honest, not sure if we need them.

… is configured

When `checksum_algorithm: md5`, the AWS SDK silently no-ops
`ChecksumAlgorithm::Md5`, so MD5 validation must go through the legacy
`Content-MD5` header instead. The multipart path already did this per
part via `maybe_compute_part_md5` + `set_content_md5`, but the
single-part path never set the header, leaving objects below the
multipart threshold uploaded without the requested MD5 validation.

This change computes `Content-MD5` for the full payload range before
streaming the body in `put_single_part_single_try`, matching the
multipart behavior exactly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@fmassot fmassot requested a review from a team as a code owner June 2, 2026 13:07
… uploads

Add two unit tests using StaticReplayClient to verify that:
- `checksum_algorithm: md5` sets the Content-MD5 header with the
  correct base64-encoded digest on single-part put_object requests.
- `checksum_algorithm: crc32c` does not set Content-MD5.

These tests cover the regression introduced by the single-part upload
path omitting Content-MD5, fixed in the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant