IGNITE-28081 Added hashing functions for C++ client#7750
IGNITE-28081 Added hashing functions for C++ client#7750Erixonich wants to merge 7 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces hashing utilities and related tests for the C++ client, plus adds HALF_UP rounding support for big_decimal::set_scale.
Changes:
- Added Murmur3-based hashing utilities and a parameterized GTest suite to validate hash outputs for multiple primitive types.
- Introduced
hash_calculatorfor hashingprimitivevalues (dispatch byignite_type). - Extended
big_decimal::set_scaleto support rounding modes and added rounding tests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/platforms/cpp/ignite/common/primitive_test.cpp | Adds a new test exercising hashing (currently looks like debug scaffolding). |
| modules/platforms/cpp/ignite/common/detail/string_extensions.h | Adds to_string helpers for use in tests/debug output. |
| modules/platforms/cpp/ignite/common/detail/string_extensions.cpp | Implements to_string helpers. |
| modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp | Adds extensive unit tests for hashing of many types. |
| modules/platforms/cpp/ignite/common/detail/hash_utils.h | Introduces hashing API and internal Murmur3-based helpers. |
| modules/platforms/cpp/ignite/common/detail/hash_utils.cpp | Implements hashing functions (currently includes placeholder implementations). |
| modules/platforms/cpp/ignite/common/detail/hash_calculator.h | Adds hash_calculator that hashes primitive via type dispatch. |
| modules/platforms/cpp/ignite/common/detail/hash_calculator.cpp | Adds translation unit for hash_calculator (currently empty). |
| modules/platforms/cpp/ignite/common/detail/Murmur3Hash.h | Adds Murmur3 function declaration. |
| modules/platforms/cpp/ignite/common/detail/Murmur3Hash.cpp | Adds Murmur3 implementation. |
| modules/platforms/cpp/ignite/common/bignum_test.cpp | Adds tests for HALF_UP rounding of big_decimal::set_scale. |
| modules/platforms/cpp/ignite/common/big_decimal.h | Adds rounding mode enum and updates set_scale signature. |
| modules/platforms/cpp/ignite/common/big_decimal.cpp | Implements HALF_UP rounding behavior in set_scale. |
| modules/platforms/cpp/ignite/common/CMakeLists.txt | Wires new sources and tests into the build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| ints.end(), | ||
| std::back_insert_iterator(res), | ||
| [](int src) { | ||
| return static_cast<std::byte>(src); | ||
| } | ||
| ); |
There was a problem hiding this comment.
from_ints converts int to std::byte with static_cast<std::byte>(src). For negative values (e.g., -1 in the test cases) the resulting value is implementation-defined and can vary between compilers/architectures. Convert via an unsigned 8-bit type first (or clamp) to ensure -1 reliably becomes 0xFF.
There was a problem hiding this comment.
our platforms always use 2s complement
| // body | ||
|
|
||
| const uint64_t * blocks = (const uint64_t *)(data); | ||
|
|
||
| for(int i = 0; i < nblocks; i++) | ||
| { | ||
| uint64_t k1 = getblock64(blocks,i*2+0); | ||
| uint64_t k2 = getblock64(blocks,i*2+1); |
There was a problem hiding this comment.
This MurmurHash3 implementation does unaligned 64-bit loads by casting data (a uint8_t*) to const uint64_t* and then indexing it. On architectures that fault on unaligned access (and per C++ aliasing/alignment rules), this is undefined behavior and can crash or produce incorrect hashes. Consider using the reference implementation’s getblock64 that reads via std::memcpy (or otherwise guarantees alignment) and handle endianness explicitly.
There was a problem hiding this comment.
I'm not sure, but x86 allows unaligned access, only issue would be is performance. @isapego WDYT?
| return murmur_original(&data, sizeof(data), seed); | ||
| } | ||
|
|
||
| inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) { | ||
| return murmur_original(&data, sizeof(data), seed); | ||
| } | ||
|
|
||
| inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) { | ||
| return murmur_original(&data, sizeof(data), seed); | ||
| } | ||
|
|
||
| inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) { | ||
| return murmur_original(&data, sizeof(data), seed); | ||
| } | ||
|
|
||
| inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) { | ||
| return murmur_original(&data, sizeof(data), seed); | ||
| } | ||
|
|
||
| inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) { | ||
| return murmur_original(&data, sizeof(data), seed); |
There was a problem hiding this comment.
hash64(...) hashes the in-memory representation of multi-byte integers (and floats/doubles) directly. That makes the resulting hashes endianness-dependent, so the same logical value will hash differently on big-endian vs little-endian platforms, which breaks cross-platform determinism. Consider serializing the value to a fixed byte order (e.g., little-endian using the existing bytes utilities) before passing bytes to Murmur.
| return murmur_original(&data, sizeof(data), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) { | |
| return murmur_original(&data, sizeof(data), seed); | |
| } | |
| inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) { | |
| return murmur_original(&data, sizeof(data), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) { | |
| return murmur_original(&data, sizeof(data), seed); | |
| } | |
| inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) { | |
| return murmur_original(&data, sizeof(data), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) { | |
| return murmur_original(&data, sizeof(data), seed); | |
| std::uint16_t u = static_cast<std::uint16_t>(data); | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) { | |
| std::uint16_t u = data; | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); | |
| } | |
| inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) { | |
| std::uint32_t u = static_cast<std::uint32_t>(data); | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) { | |
| std::uint32_t u = data; | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); | |
| } | |
| inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) { | |
| std::uint64_t u = static_cast<std::uint64_t>(data); | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); | |
| } | |
| inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) { | |
| std::uint64_t u = data; | |
| std::uint8_t buf[sizeof(u)]; | |
| for (std::size_t i = 0; i < sizeof(u); ++i) { | |
| buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF); | |
| } | |
| return murmur_original(buf, sizeof(buf), seed); |
There was a problem hiding this comment.
All our platforms are little-endian.
@isapego should we address it?
https://issues.apache.org/jira/browse/IGNITE-28081
Thank you for submitting the pull request.
To streamline the review process of the patch and ensure better code quality
we ask both an author and a reviewer to verify the following:
The Review Checklist
- There is a single JIRA ticket related to the pull request.
- The web-link to the pull request is attached to the JIRA ticket.
- The JIRA ticket has the Patch Available state.
- The description of the JIRA ticket explains WHAT was made, WHY and HOW.
- The pull request title is treated as the final commit message. The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
Notes