Fix #581: Add CRC32 checksum verification for pushdata/fetchdata data channels#887
Fix #581: Add CRC32 checksum verification for pushdata/fetchdata data channels#887OmDoshi13 wants to merge 5 commits into
Conversation
…ta channels (eBay#581) During SM long-running tests, followers observed hash mismatches and invalid headers during read-verify. Root cause: PushData and FetchData channels transmitted raw payloads with no integrity check, allowing in-flight corruption to be silently written to disk. Changes: - Add checksum: uint32 to PushDataRequest and ResponseEntry FlatBuffer schemas - Add data_checksum_enabled: bool = true (hotswap) to Consensus config so the overhead can be toggled off at runtime for performance benchmarking - Compute CRC32 over the data payload before sending on both push and fetch paths - Verify CRC32 on receipt; drop and let retry on mismatch rather than writing corrupt data to disk - FetchData response now optionally prepends a FetchDataResponse FlatBuffer header carrying per-entry checksums, gated on data_checksum_enabled to ensure backward compatibility during rolling upgrades
- Switch framing detection to magic-byte prefix (FETCH_DATA_RESPONSE_MAGIC) so receivers self-describe the header without consulting per-node config, making detection safe under hotswap config asymmetry between nodes - Default data_checksum_enabled to false; operators enable via hotswap after all nodes are running the new binary to avoid old nodes misinterpreting the framing header as block data - Add FlatBuffer verifier on the receive path to guard against malformed headers - Fix header memory management: heap-allocate with new/delete[] instead of iobuf - Attach per-entry lsn/dsn/raft_term to ResponseEntry for richer diagnostics - Increment data_checksum_mismatch_cnt metric on every mismatch - Add Checksum_Enabled_PushData_Path and Checksum_Enabled_FetchData_Path tests
F1: On FetchData checksum mismatch, immediately re-fetch via check_and_fetch_remote_data() instead of waiting for the full data_receive_timeout_ms (10s) stall. Mismatched rreqs are collected in the response loop and re-queued after processing the valid entries. F2: Guard against total_size unsigned underflow before each total_size -= data_size subtraction in handle_fetch_data_response. A truncated or malformed response could wrap total_size to ~4GB and advance raw_data past the blob boundary in release builds. F3: Add flatbuffers::Verifier call in on_push_data_received before any PushDataRequest field is accessed. Without this, a corrupted or crafted push RPC could cause out-of-bounds reads via the unverified FlatBuffer accessor, matching the verification already present on the FetchData response path.
There was a problem hiding this comment.
@OmDoshi13 thanks for taking this and helping with the tech debt, It looks nice in general.
I think your changes can be categorised into 3 parts
A: Corner cases fixes, mainly in your commit 3.
B: Moving FetchDataResponse from raw to Flatbuffer
C: CRC implementation.
I dont have any problem with A and hoping A can be done in a separate PR to allow it be merged sooner.
For B, I saw you added a FETCH_DATA_RESPONSE_MAGIC as well as corresponding logic to support rolling upgrade and hybrid version, that is nice. And yes, It is totally a mistake in the past that skipping flatbuffer instead using raw format...
however, using the magic header is another dead-end, as we have to carry this "compatibility" tax forever, it doesnt help us in next release that removing the magic header and merge back to the native flat-buffer approach.
With this thinking, do you think we can directly using FlatBuffer w/o MAGIC, as flatbuffer is size-prefixed, we can read the first size_bytes, and check if that number makes sense , then proceed to flatbuffer::verify, if that is also passing it is very likely it is new format, then we check CRCs of raw data against what was in flatbuffer.
Basically I am trying to suggest try-and-fallback pattern.Note the raw data should be a homeobject blobs/shards where has a DataHeader with magic uint64_t magic{0x21fdffdba8d68fc6};, , that makes the flatbuffer prefixed size to be 2.6GB, which will very probably fails the basic size check.
struct DataHeader {
static constexpr uint8_t data_header_version = 0x02;
static constexpr uint64_t data_header_magic = 0x21fdffdba8d68fc6; // echo "BlobHeader" | md5sum
enum class data_type_t : uint32_t { SHARD_INFO = 1, BLOB_INFO = 2 };
bool valid() const { return ((magic == data_header_magic) && (version <= data_header_version)); }
uint64_t magic{data_header_magic};
uint8_t version{data_header_version};
data_type_t type{data_type_t::BLOB_INFO};
};
|
For C: A bit more context on the CRC: once we added length check, we never saw data corruption on wire anymore. Basically it is hard to see bit rot in TCP protocol as TCP protocol already provides its CRC in TCP-Header. Usually the error was caused by early terminating --- sender disconnected before sending full data, receiver received partial data and blindly using it without checking the prefix-size. With this assumption and observation in production , I think we wont enable the CRC for generic use cases, only use in more durability sensitive environment if needed. |
… detection Per xiaoxichen's review feedback on PR eBay#887: B — FetchData wire format: drop the FETCH_DATA_RESPONSE_MAGIC header approach. Senders now always emit a size-prefixed FetchDataResponse FlatBuffer (rolling upgrade safe: old raw-format senders are detected by receivers automatically). Receivers use try-and-fallback: read the 4-byte uoffset_t size prefix; if fb_hdr_size <= total_size and flatbuffers::Verifier passes it is new format, otherwise fall back to old raw format. Old senders' DataHeader magic (0x21fdffdba8d68fc6) reads as a ~3.69 GB uoffset_t and fails the size check, so the two formats are distinguishable without any explicit magic bytes. C — CRC decoupled from wire format: data_checksum_enabled now controls only the send side (whether to compute and fill in the checksum field). Receivers skip CRC verification whenever checksum == 0, regardless of their own config. Updated homestore_config.fbs comment to reflect this. CRC is not enabled by default: TCP already guards against partial-transfer corruption; opt in only for durability-sensitive environments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… detection Per xiaoxichen's review feedback on PR eBay#887: B — FetchData wire format: drop the FETCH_DATA_RESPONSE_MAGIC header approach. Senders now always emit a size-prefixed FetchDataResponse FlatBuffer (rolling upgrade safe: old raw-format senders are detected by receivers automatically). Receivers use try-and-fallback: read the 4-byte uoffset_t size prefix; if fb_hdr_size <= total_size and flatbuffers::Verifier passes it is new format, otherwise fall back to old raw format. Old senders' DataHeader magic (0x21fdffdba8d68fc6) reads as a ~3.69 GB uoffset_t and fails the size check, so the two formats are distinguishable without any explicit magic bytes. C — CRC decoupled from wire format: data_checksum_enabled now controls only the send side (whether to compute and fill in the checksum field). Receivers skip CRC verification whenever checksum == 0, regardless of their own config. Updated homestore_config.fbs comment to reflect this. CRC is not enabled by default: TCP already guards against partial-transfer corruption; opt in only for durability-sensitive environments.
330c534 to
9a2ec70
Compare
…lback Replace FETCH_DATA_RESPONSE_MAGIC framing with a size-prefix try-and-fallback to detect new vs old FetchDataResponse format on the receiver side. Sender (on_fetch_data_received): emit a size-prefixed FetchDataResponse FlatBuffer only when data_checksum_enabled=true. When disabled the old raw wire format is preserved unchanged, so old receivers are not affected during rolling upgrades. Receiver (handle_fetch_data_response): attempt to parse a FlatBuffer by reading the 4-byte uoffset_t prefix, checking fb_hdr_size <= total_size, and running flatbuffers::Verifier. Falls back to raw format on any failure. Raw block data from HomeObject begins with DataHeader magic (0x21fdffdba8d68fc6) whose first 4 bytes as a uoffset_t read as ~2.8 GB, reliably failing the size check. Additional fixes: - Move GetSizePrefixedPushDataRequest to after the flatbuffers::Verifier guard in on_push_data_received so no field access precedes buffer verification. - data_checksum_enabled now gates send-side only; receivers skip CRC when checksum field is zero (0 = not computed sentinel). - Widen uoffset_t arithmetic to uint64_t in on_push_data_received and handle_fetch_data_response to prevent integer wrap when ReadScalar returns a value near 0xFFFFFFFF. - Update homestore_config.fbs comment to document send-side-only semantics.
|
Hey @xiaoxichen, the branch has been updated — |
xiaoxichen
left a comment
There was a problem hiding this comment.
lgtm.
cc @Besroy note this "might" be a breaking change if the detect-and-fallback on receiver side doesn't works as expected.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## stable/v7.x #887 +/- ##
==============================================
Coverage ? 48.07%
==============================================
Files ? 110
Lines ? 13029
Branches ? 6273
==============================================
Hits ? 6264
Misses ? 2604
Partials ? 4161 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
Fixes #581 — silent data corruption caused by network-layer corruption between leader and followers going undetected until user read/write time.
Key implementation details
FETCH_DATA_RESPONSE_MAGIC) so receivers detect the header without consulting per-node config — safe under hotswap config asymmetry during rolling upgradesdata_checksum_enableddefaults tofalse; operators enable via hotswap after all nodes are running the new binary to avoid old nodes misinterpreting the framing header as block datadata_checksum_mismatch_cntmetric for observabilitylsn/dsn/raft_termattached toResponseEntryfor richer diagnosticsCorrectness fixes (commit 3)
Three issues identified during review and fixed before merge:
data_receive_timeout_ms(10s) before Raft retried. Now mismatched rreqs are collected during the response loop and passed tocheck_and_fetch_remote_data()immediately after, eliminating the stall.handle_fetch_data_response: A truncated or malformed response could causetotal_size -= data_sizeto wrap to ~4GB and advanceraw_datapast the buffer boundary in release builds. Added an explicitdata_size > total_sizeguard with early return before each subtraction.push_reqfields were accessed before the buffer was verified inon_push_data_received. A corrupted or crafted push RPC could cause out-of-bounds reads via the unverified FlatBuffer accessor. Addedflatbuffers::Verifiercall before any field access, matching the pattern already present on the FetchData response path.Test plan
Checksum_Enabled_PushData_Path— happy path with checksums on; writes commit correctly across all replicasChecksum_Enabled_FetchData_Path— drops all push-data on followers forcing fetch path; verifies framing header is parsed correctly and data arrives intactFollower_Fetch_OnActive_ReplicaGroupstill passes (no regression)All 3 tests passed across all 4 CI build variants (Debug/Sanitize, Debug/Coverage, Release/tcmalloc prerelease, Release/tcmalloc) — 100% tests passed, 0 failed out of 21 total.