fix(storage): set Content-MD5 header for single-part uploads when md5 is configured#6485
Open
fmassot wants to merge 2 commits into
Open
fix(storage): set Content-MD5 header for single-part uploads when md5 is configured#6485fmassot wants to merge 2 commits into
fmassot wants to merge 2 commits into
Conversation
… is configured When `checksum_algorithm: md5`, the AWS SDK silently no-ops `ChecksumAlgorithm::Md5`, so MD5 validation must go through the legacy `Content-MD5` header instead. The multipart path already did this per part via `maybe_compute_part_md5` + `set_content_md5`, but the single-part path never set the header, leaving objects below the multipart threshold uploaded without the requested MD5 validation. This change computes `Content-MD5` for the full payload range before streaming the body in `put_single_part_single_try`, matching the multipart behavior exactly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… uploads Add two unit tests using StaticReplayClient to verify that: - `checksum_algorithm: md5` sets the Content-MD5 header with the correct base64-encoded digest on single-part put_object requests. - `checksum_algorithm: crc32c` does not set Content-MD5. These tests cover the regression introduced by the single-part upload path omitting Content-MD5, fixed in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
checksum_algorithm: md5is configured,aws_checksum_algorithm()returnsNone(the AWS SDK silently no-opsChecksumAlgorithm::Md5). MD5 validation must therefore go through the legacyContent-MD5header instead.The multipart path already handles this correctly:
maybe_compute_part_md5computes a digest per part andupload_partsetsContent-MD5on each request. However, the single-part path (put_single_part_single_try) never set the header, so objects below the multipart threshold were uploaded without the requested MD5 validation even though the config/docs say MD5 is sent viaContent-MD5.Note that
Content-MD5is optional in the S3 API. By not passing it, it means that we can miss some corruption in transit for single-part uploads. Which is not the default behavior we want.Fix
Call
maybe_compute_part_md5(payload.as_ref(), 0..len)before streaming the body and set.set_content_md5(content_md5)on theput_objectrequest — exactly mirroring what the multipart path does per part.When the algorithm is not MD5,
maybe_compute_part_md5short-circuits immediately withNone(no I/O), so there is no overhead for non-MD5 configs.Testing
cargo check -p quickwit-storagepasses.