Skip to content

IGNITE-28081 Added hashing functions for C++ client#7750

Open
Erixonich wants to merge 7 commits intoapache:mainfrom
Erixonich:ignite-28081
Open

IGNITE-28081 Added hashing functions for C++ client#7750
Erixonich wants to merge 7 commits intoapache:mainfrom
Erixonich:ignite-28081

Conversation

@Erixonich
Copy link
Contributor

@Erixonich Erixonich commented Mar 10, 2026

https://issues.apache.org/jira/browse/IGNITE-28081

Thank you for submitting the pull request.

To streamline the review process of the patch and ensure better code quality
we ask both an author and a reviewer to verify the following:

The Review Checklist

  • Formal criteria: TC status, codestyle, mandatory documentation. Also make sure to complete the following:
    - There is a single JIRA ticket related to the pull request.
    - The web-link to the pull request is attached to the JIRA ticket.
    - The JIRA ticket has the Patch Available state.
    - The description of the JIRA ticket explains WHAT was made, WHY and HOW.
    - The pull request title is treated as the final commit message. The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • Design: new code conforms with the design principles of the components it is added to.
  • Patch quality: patch cannot be split into smaller pieces, its size must be reasonable.
  • Code quality: code is clean and readable, necessary developer documentation is added if needed.
  • Tests code quality: test set covers positive/negative scenarios, happy/edge cases. Tests are effective in terms of execution time and resources.

Notes

@Erixonich Erixonich requested a review from isapego as a code owner March 10, 2026 20:09
@isapego isapego requested a review from Copilot March 10, 2026 20:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces hashing utilities and related tests for the C++ client, plus adds HALF_UP rounding support for big_decimal::set_scale.

Changes:

  • Added Murmur3-based hashing utilities and a parameterized GTest suite to validate hash outputs for multiple primitive types.
  • Introduced hash_calculator for hashing primitive values (dispatch by ignite_type).
  • Extended big_decimal::set_scale to support rounding modes and added rounding tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
modules/platforms/cpp/ignite/common/primitive_test.cpp Adds a new test exercising hashing (currently looks like debug scaffolding).
modules/platforms/cpp/ignite/common/detail/string_extensions.h Adds to_string helpers for use in tests/debug output.
modules/platforms/cpp/ignite/common/detail/string_extensions.cpp Implements to_string helpers.
modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp Adds extensive unit tests for hashing of many types.
modules/platforms/cpp/ignite/common/detail/hash_utils.h Introduces hashing API and internal Murmur3-based helpers.
modules/platforms/cpp/ignite/common/detail/hash_utils.cpp Implements hashing functions (currently includes placeholder implementations).
modules/platforms/cpp/ignite/common/detail/hash_calculator.h Adds hash_calculator that hashes primitive via type dispatch.
modules/platforms/cpp/ignite/common/detail/hash_calculator.cpp Adds translation unit for hash_calculator (currently empty).
modules/platforms/cpp/ignite/common/detail/Murmur3Hash.h Adds Murmur3 function declaration.
modules/platforms/cpp/ignite/common/detail/Murmur3Hash.cpp Adds Murmur3 implementation.
modules/platforms/cpp/ignite/common/bignum_test.cpp Adds tests for HALF_UP rounding of big_decimal::set_scale.
modules/platforms/cpp/ignite/common/big_decimal.h Adds rounding mode enum and updates set_scale signature.
modules/platforms/cpp/ignite/common/big_decimal.cpp Implements HALF_UP rounding behavior in set_scale.
modules/platforms/cpp/ignite/common/CMakeLists.txt Wires new sources and tests into the build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Erixonich Erixonich changed the title Added hashing to C++ client Added hashing functions for C++ client Mar 11, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +427 to +432
ints.end(),
std::back_insert_iterator(res),
[](int src) {
return static_cast<std::byte>(src);
}
);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_ints converts int to std::byte with static_cast<std::byte>(src). For negative values (e.g., -1 in the test cases) the resulting value is implementation-defined and can vary between compilers/architectures. Convert via an unsigned 8-bit type first (or clamp) to ensure -1 reliably becomes 0xFF.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our platforms always use 2s complement

Comment on lines +126 to +133
// body

const uint64_t * blocks = (const uint64_t *)(data);

for(int i = 0; i < nblocks; i++)
{
uint64_t k1 = getblock64(blocks,i*2+0);
uint64_t k2 = getblock64(blocks,i*2+1);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MurmurHash3 implementation does unaligned 64-bit loads by casting data (a uint8_t*) to const uint64_t* and then indexing it. On architectures that fault on unaligned access (and per C++ aliasing/alignment rules), this is undefined behavior and can crash or produce incorrect hashes. Consider using the reference implementation’s getblock64 that reads via std::memcpy (or otherwise guarantees alignment) and handle endianness explicitly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but x86 allows unaligned access, only issue would be is performance. @isapego WDYT?

Comment on lines +52 to +72
return murmur_original(&data, sizeof(data), seed);
}

inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}

inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}

inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}

inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}

inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash64(...) hashes the in-memory representation of multi-byte integers (and floats/doubles) directly. That makes the resulting hashes endianness-dependent, so the same logical value will hash differently on big-endian vs little-endian platforms, which breaks cross-platform determinism. Consider serializing the value to a fixed byte order (e.g., little-endian using the existing bytes utilities) before passing bytes to Murmur.

Suggested change
return murmur_original(&data, sizeof(data), seed);
}
inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}
inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}
inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}
inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
}
inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
return murmur_original(&data, sizeof(data), seed);
std::uint16_t u = static_cast<std::uint16_t>(data);
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);
}
inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
std::uint16_t u = data;
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);
}
inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
std::uint32_t u = static_cast<std::uint32_t>(data);
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);
}
inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
std::uint32_t u = data;
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);
}
inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
std::uint64_t u = static_cast<std::uint64_t>(data);
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);
}
inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
std::uint64_t u = data;
std::uint8_t buf[sizeof(u)];
for (std::size_t i = 0; i < sizeof(u); ++i) {
buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
}
return murmur_original(buf, sizeof(buf), seed);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All our platforms are little-endian.
@isapego should we address it?

@Erixonich Erixonich changed the title Added hashing functions for C++ client IGNITE-28081 Added hashing functions for C++ client Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants