-
Notifications
You must be signed in to change notification settings - Fork 255
CLDSRV-898: CompleteMultipartUpload checksums #6166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
leif-scality
wants to merge
11
commits into
development/9.4
from
improvement/CLDSRV-898-complete-mpu-checksums
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
bbded70
CLDSRV-898: fix lint issues
leif-scality 027829f
CLDSRV-898: rename getObjectAttributesXMLTag to xmlTag
leif-scality d23dd9d
CLDSRV-898: validate per-part checksums and x-amz-checksum-type
leif-scality 2ad777a
CLDSRV-898: calculate final FULL_OBJECT (crc combine) and COMPOSITE c…
leif-scality 27ab1be
CLDSRV-898: CompleteMPU calculate and validate final checksum with ch…
leif-scality 877af99
CLDSRV-898: CompleteMPU store FULL_OBJECT checksum in object metadata
leif-scality a8b0c88
CLDSRV-898: CompleteMPU set checksum value and type in response XML body
leif-scality 732917c
CLDSRV-898: CompleteMPU checksum functional tests
leif-scality 441661c
CLDSRV-898: reject Checksum<X> field on a default MPU checksum
leif-scality e189abb
CLDSRV-898: reject CompleteMPU if explicit checksum but checksum miss…
leif-scality a9db936
CLDSRV-898: bump arsenal to 8.4.2
leif-scality File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,185 @@ | ||
| 'use strict'; | ||
|
|
||
| // Combine two right-shift CRCs (zlib's gf2_matrix_* trick) without using BigInt | ||
| // inside the hot loops. Each GF(2) operator matrix is stored as a Uint32Array | ||
| // of `2 * dim` words, where row n is packed as [lo32, hi32]. For 32-bit CRCs | ||
| // the high halves stay zero and the per-row loop exits early; for the 64-bit | ||
| // CRC (crc64nvme) the pair-of-u32s representation lets every XOR/shift stay on | ||
| // 32-bit ints. | ||
| // | ||
| // References: | ||
| // zlib crc32_combine (canonical C implementation): | ||
| // https://github.com/madler/zlib/blob/master/crc32.c | ||
| // Mark Adler, "How does CRC32 work?" — derivation of the matrix trick: | ||
| // https://stackoverflow.com/a/23126768 | ||
| // AWS S3 multipart upload full-object checksums: | ||
| // https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html | ||
|
|
||
| function gf2MatrixTimes(mat, vecLo, vecHi) { | ||
| let sumLo = 0; | ||
| let sumHi = 0; | ||
| let lo = vecLo; | ||
| let hi = vecHi; | ||
| let i = 0; | ||
| while ((lo | hi) !== 0) { | ||
| if (lo & 1) { | ||
| sumLo ^= mat[2 * i]; | ||
| sumHi ^= mat[2 * i + 1]; | ||
| } | ||
| lo = (lo >>> 1) | ((hi & 1) << 31); | ||
| hi = hi >>> 1; | ||
| i += 1; | ||
| } | ||
| return [sumLo >>> 0, sumHi >>> 0]; | ||
| } | ||
|
|
||
| function gf2MatrixSquare(square, mat, dim) { | ||
| for (let n = 0; n < dim; n += 1) { | ||
| const r = gf2MatrixTimes(mat, mat[2 * n], mat[2 * n + 1]); | ||
| // In-place mutation of the caller's scratch buffer is intentional — | ||
| // the callers (crcCombine, ensureChainLen) own `square` and re-use | ||
| // it across iterations to avoid re-allocating per squaring step. | ||
| /* eslint-disable no-param-reassign */ | ||
| square[2 * n] = r[0]; | ||
| square[2 * n + 1] = r[1]; | ||
| /* eslint-enable no-param-reassign */ | ||
| } | ||
| } | ||
|
|
||
| // Per (polyReversed, dim), a lazily-grown chain of zero-byte operators. | ||
| // state.byteOps[j] is the GF(2) operator for prepending 2^j zero bytes | ||
| // (i.e. M^(8 * 2^j)). Building this chain is the dominant cost of crcCombine | ||
| // and depends only on the polynomial, so we cache it across calls. | ||
| const chainCache = new Map(); | ||
|
|
||
| function getOrInitChain(polyReversed, dim) { | ||
| let state = chainCache.get(polyReversed); | ||
| if (state !== undefined) { | ||
| return state; | ||
| } | ||
|
|
||
| // M^1: one-zero-bit operator. Column 0 is the polynomial; column k>0 is | ||
| // 1 << (k - 1) — what right-shifting a state with bit k set produces. | ||
| const m1 = new Uint32Array(2 * dim); | ||
| m1[0] = Number(polyReversed & 0xffffffffn); | ||
| m1[1] = Number((polyReversed >> 32n) & 0xffffffffn); | ||
| for (let k = 1; k < dim; k += 1) { | ||
| const bit = k - 1; | ||
| if (bit < 32) { | ||
| m1[2 * k] = (1 << bit) >>> 0; | ||
| } else { | ||
| m1[2 * k + 1] = (1 << (bit - 32)) >>> 0; | ||
| } | ||
| } | ||
|
|
||
| const m2 = new Uint32Array(2 * dim); | ||
| gf2MatrixSquare(m2, m1, dim); | ||
| const m4 = new Uint32Array(2 * dim); | ||
| gf2MatrixSquare(m4, m2, dim); | ||
| const m8 = new Uint32Array(2 * dim); // operator for 1 zero byte | ||
| gf2MatrixSquare(m8, m4, dim); | ||
|
|
||
| state = { dim, byteOps: [m8] }; | ||
| chainCache.set(polyReversed, state); | ||
| return state; | ||
| } | ||
|
|
||
| function ensureChainLen(state, j) { | ||
| while (state.byteOps.length <= j) { | ||
| const prev = state.byteOps[state.byteOps.length - 1]; | ||
| const next = new Uint32Array(prev.length); | ||
| gf2MatrixSquare(next, prev, state.dim); | ||
| state.byteOps.push(next); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Combine two CRCs of adjacent byte chunks. | ||
| * | ||
| * crcCombine(crc(a), crc(b), len(b), polyReversed, dim) === crc(a ‖ b) | ||
| * | ||
| * Works for any right-shift CRC of width `dim` (32 or 64) given its | ||
| * bit-reversed polynomial. The squaring chain for `polyReversed` is cached | ||
| * across calls, so the per-call cost is just popcount(len2) cheap operator | ||
| * applications plus the BigInt boundary conversions. | ||
| * | ||
| * @param {bigint} crc1 - CRC of the first chunk | ||
| * @param {bigint} crc2 - CRC of the second chunk | ||
| * @param {bigint} len2 - byte length of the second chunk | ||
| * @param {bigint} polyReversed - bit-reversed polynomial | ||
| * @param {number} dim - CRC width in bits (32 or 64) | ||
| * @returns {bigint} CRC of the concatenated chunk, masked to `dim` bits | ||
| */ | ||
| function crcCombine(crc1, crc2, len2, polyReversed, dim) { | ||
| const mask = (1n << BigInt(dim)) - 1n; | ||
| if (len2 === 0n) { | ||
| return crc1 & mask; | ||
| } | ||
|
|
||
| const state = getOrInitChain(polyReversed, dim); | ||
|
|
||
| let cLo = Number(crc1 & 0xffffffffn); | ||
| let cHi = Number((crc1 >> 32n) & 0xffffffffn); | ||
|
|
||
| // Walk the bits of len2 (each bit represents a power-of-two number of | ||
| // zero bytes to prepend); apply the cached operator for every set bit. | ||
| let n = len2; | ||
| let j = 0; | ||
| while (n !== 0n) { | ||
| if ((n & 1n) === 1n) { | ||
| ensureChainLen(state, j); | ||
| const r = gf2MatrixTimes(state.byteOps[j], cLo, cHi); | ||
| cLo = r[0]; | ||
| cHi = r[1]; | ||
| } | ||
| n >>= 1n; | ||
| j += 1; | ||
| } | ||
|
|
||
| const c2Lo = Number(crc2 & 0xffffffffn); | ||
| const c2Hi = Number((crc2 >> 32n) & 0xffffffffn); | ||
| cLo = (cLo ^ c2Lo) >>> 0; | ||
| cHi = (cHi ^ c2Hi) >>> 0; | ||
|
|
||
| return ((BigInt(cHi) << 32n) | BigInt(cLo)) & mask; | ||
| } | ||
|
|
||
| function base64ToBigInt(b64) { | ||
| const buf = Buffer.from(b64, 'base64'); | ||
| let r = 0n; | ||
| for (let i = 0; i < buf.length; i += 1) { | ||
| r = (r << 8n) | BigInt(buf[i]); | ||
| } | ||
| return r; | ||
| } | ||
|
|
||
| function bigIntToBase64(value, dim) { | ||
| const nBytes = dim / 8; | ||
| const buf = Buffer.alloc(nBytes); | ||
| let v = value; | ||
| for (let i = nBytes - 1; i >= 0; i -= 1) { | ||
| buf[i] = Number(v & 0xffn); | ||
| v >>= 8n; | ||
| } | ||
| return buf.toString('base64'); | ||
| } | ||
|
|
||
| /** | ||
| * Combine N per-part CRCs into the full-object CRC, base64-encoded. | ||
| * | ||
| * @param {Array<{value: string, length: number}>} parts - per-part data in | ||
| * part order; `value` is the base64-encoded per-part CRC, `length` is the | ||
| * byte length of that part | ||
| * @param {bigint} polyReversed - bit-reversed polynomial | ||
| * @param {number} dim - CRC width in bits (32 or 64) | ||
| * @returns {string} base64-encoded combined CRC | ||
| */ | ||
| function combineCrcs(parts, polyReversed, dim) { | ||
| let combined = base64ToBigInt(parts[0].value); | ||
|
leif-scality marked this conversation as resolved.
|
||
| for (let i = 1; i < parts.length; i += 1) { | ||
| combined = crcCombine(combined, base64ToBigInt(parts[i].value), BigInt(parts[i].length), polyReversed, dim); | ||
| } | ||
| return bigIntToBase64(combined, dim); | ||
| } | ||
|
|
||
| module.exports = { combineCrcs, crcCombine }; | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.