IGNITE-28081 Added hashing functions for C++ client by Erixonich · Pull Request #7750 · apache/ignite-3

Erixonich · 2026-03-10T20:09:29Z

https://issues.apache.org/jira/browse/IGNITE-28081

Thank you for submitting the pull request.

To streamline the review process of the patch and ensure better code quality
we ask both an author and a reviewer to verify the following:

The Review Checklist

Formal criteria: TC status, codestyle, mandatory documentation. Also make sure to complete the following:
- There is a single JIRA ticket related to the pull request.
- The web-link to the pull request is attached to the JIRA ticket.
- The JIRA ticket has the Patch Available state.
- The description of the JIRA ticket explains WHAT was made, WHY and HOW.
- The pull request title is treated as the final commit message. The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
Design: new code conforms with the design principles of the components it is added to.
Patch quality: patch cannot be split into smaller pieces, its size must be reasonable.
Code quality: code is clean and readable, necessary developer documentation is added if needed.
Tests code quality: test set covers positive/negative scenarios, happy/edge cases. Tests are effective in terms of execution time and resources.

Notes

Apache Ignite Coding Guidelines

Copilot

Pull request overview

This PR introduces hashing utilities and related tests for the C++ client, plus adds HALF_UP rounding support for big_decimal::set_scale.

Changes:

Added Murmur3-based hashing utilities and a parameterized GTest suite to validate hash outputs for multiple primitive types.
Introduced hash_calculator for hashing primitive values (dispatch by ignite_type).
Extended big_decimal::set_scale to support rounding modes and added rounding tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
modules/platforms/cpp/ignite/common/primitive_test.cpp	Adds a new test exercising hashing (currently looks like debug scaffolding).
modules/platforms/cpp/ignite/common/detail/string_extensions.h	Adds `to_string` helpers for use in tests/debug output.
modules/platforms/cpp/ignite/common/detail/string_extensions.cpp	Implements `to_string` helpers.
modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp	Adds extensive unit tests for hashing of many types.
modules/platforms/cpp/ignite/common/detail/hash_utils.h	Introduces hashing API and internal Murmur3-based helpers.
modules/platforms/cpp/ignite/common/detail/hash_utils.cpp	Implements hashing functions (currently includes placeholder implementations).
modules/platforms/cpp/ignite/common/detail/hash_calculator.h	Adds `hash_calculator` that hashes `primitive` via type dispatch.
modules/platforms/cpp/ignite/common/detail/hash_calculator.cpp	Adds translation unit for `hash_calculator` (currently empty).
modules/platforms/cpp/ignite/common/detail/Murmur3Hash.h	Adds Murmur3 function declaration.
modules/platforms/cpp/ignite/common/detail/Murmur3Hash.cpp	Adds Murmur3 implementation.
modules/platforms/cpp/ignite/common/bignum_test.cpp	Adds tests for HALF_UP rounding of `big_decimal::set_scale`.
modules/platforms/cpp/ignite/common/big_decimal.h	Adds rounding mode enum and updates `set_scale` signature.
modules/platforms/cpp/ignite/common/big_decimal.cpp	Implements HALF_UP rounding behavior in `set_scale`.
modules/platforms/cpp/ignite/common/CMakeLists.txt	Wires new sources and tests into the build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

modules/platforms/cpp/ignite/common/detail/hash_utils.cpp

modules/platforms/cpp/ignite/common/detail/hash_utils.h

modules/platforms/cpp/ignite/common/detail/hash_calculator.h

modules/platforms/cpp/ignite/common/primitive_test.cpp

modules/platforms/cpp/ignite/common/bignum_test.cpp

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

modules/platforms/cpp/ignite/common/detail/hash_utils.h

modules/platforms/cpp/ignite/common/ignite_time.h

modules/platforms/cpp/ignite/common/detail/hash_calculator.cpp

modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp

modules/platforms/cpp/ignite/common/primitive_test.cpp

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp

Copilot · 2026-03-11T13:06:54Z

modules/platforms/cpp/ignite/common/detail/hash_utils_test.cpp

+        ints.end(),
+        std::back_insert_iterator(res),
+        [](int src) {
+            return static_cast<std::byte>(src);
+        }
+        );


from_ints converts int to std::byte with static_cast<std::byte>(src). For negative values (e.g., -1 in the test cases) the resulting value is implementation-defined and can vary between compilers/architectures. Convert via an unsigned 8-bit type first (or clamp) to ensure -1 reliably becomes 0xFF.

our platforms always use 2s complement

modules/platforms/cpp/ignite/common/detail/Murmur3Hash.h

Copilot · 2026-03-11T13:06:54Z

modules/platforms/cpp/ignite/common/detail/Murmur3Hash.cpp

+  // body
+
+  const uint64_t * blocks = (const uint64_t *)(data);
+
+  for(int i = 0; i < nblocks; i++)
+  {
+    uint64_t k1 = getblock64(blocks,i*2+0);
+    uint64_t k2 = getblock64(blocks,i*2+1);


This MurmurHash3 implementation does unaligned 64-bit loads by casting data (a uint8_t*) to const uint64_t* and then indexing it. On architectures that fault on unaligned access (and per C++ aliasing/alignment rules), this is undefined behavior and can crash or produce incorrect hashes. Consider using the reference implementation’s getblock64 that reads via std::memcpy (or otherwise guarantees alignment) and handle endianness explicitly.

I'm not sure, but x86 allows unaligned access, only issue would be is performance. @isapego WDYT?

Copilot · 2026-03-11T13:06:54Z

modules/platforms/cpp/ignite/common/detail/hash_utils.h

+    return murmur_original(&data, sizeof(data), seed);
+}
+
+inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
+    return murmur_original(&data, sizeof(data), seed);
+}
+
+inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
+    return murmur_original(&data, sizeof(data), seed);
+}
+
+inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
+    return murmur_original(&data, sizeof(data), seed);
+}
+
+inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
+    return murmur_original(&data, sizeof(data), seed);
+}
+
+inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
+    return murmur_original(&data, sizeof(data), seed);


hash64(...) hashes the in-memory representation of multi-byte integers (and floats/doubles) directly. That makes the resulting hashes endianness-dependent, so the same logical value will hash differently on big-endian vs little-endian platforms, which breaks cross-platform determinism. Consider serializing the value to a fixed byte order (e.g., little-endian using the existing bytes utilities) before passing bytes to Murmur.

Suggested change

return murmur_original(&data, sizeof(data), seed);

}

inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {

return murmur_original(&data, sizeof(data), seed);

}

inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {

return murmur_original(&data, sizeof(data), seed);

}

inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {

return murmur_original(&data, sizeof(data), seed);

}

inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {

return murmur_original(&data, sizeof(data), seed);

}

inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {

return murmur_original(&data, sizeof(data), seed);

std::uint16_t u = static_cast<std::uint16_t>(data);

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

}

inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {

std::uint16_t u = data;

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

}

inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {

std::uint32_t u = static_cast<std::uint32_t>(data);

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

}

inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {

std::uint32_t u = data;

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

}

inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {

std::uint64_t u = static_cast<std::uint64_t>(data);

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

}

inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {

std::uint64_t u = data;

std::uint8_t buf[sizeof(u)];

for (std::size_t i = 0; i < sizeof(u); ++i) {

buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);

}

return murmur_original(buf, sizeof(buf), seed);

All our platforms are little-endian.
@isapego should we address it?

Erixonich requested a review from isapego as a code owner March 10, 2026 20:09

isapego requested a review from Copilot March 10, 2026 20:10

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Copilot started reviewing on behalf of isapego March 10, 2026 20:26 View session

IGNITE-28081 added hashing functions for C++ client

f17a2a1

Erixonich force-pushed the ignite-28081 branch from 44d1f6d to f17a2a1 Compare March 11, 2026 12:22

Erixonich changed the title ~~Added hashing to C++ client~~ Added hashing functions for C++ client Mar 11, 2026

Erixonich added 2 commits March 11, 2026 15:30

IGNITE-28081 robot review fixes

6340d58

IGNITE-28081 robot review fixes

de79b55

Erixonich requested a review from Copilot March 11, 2026 12:33

Copilot started reviewing on behalf of Erixonich March 11, 2026 12:34 View session

IGNITE-28081 added error desc

41489a2

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Erixonich added 2 commits March 11, 2026 15:49

IGNITE-28081 review fixes

9c3cd99

IGNITE-28081 review fixes

2b7b90f

Erixonich requested a review from Copilot March 11, 2026 12:56

Copilot started reviewing on behalf of Erixonich March 11, 2026 12:57 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

IGNITE-28081 review fixes round 4

94dfaf4

Erixonich changed the title ~~Added hashing functions for C++ client~~ IGNITE-28081 Added hashing functions for C++ client Mar 13, 2026

-    return murmur_original(&data, sizeof(data), seed);
-}
-inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
-    return murmur_original(&data, sizeof(data), seed);
-}
-inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
-    return murmur_original(&data, sizeof(data), seed);
-}
-inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
-    return murmur_original(&data, sizeof(data), seed);
-}
-inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
-    return murmur_original(&data, sizeof(data), seed);
-}
-inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
-    return murmur_original(&data, sizeof(data), seed);
+    std::uint16_t u = static_cast<std::uint16_t>(data);
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);
+}
+inline std::uint64_t hash64(std::uint16_t data, std::uint64_t seed) {
+    std::uint16_t u = data;
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);
+}
+inline std::uint64_t hash64(std::int32_t data, std::uint64_t seed) {
+    std::uint32_t u = static_cast<std::uint32_t>(data);
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);
+}
+inline std::uint64_t hash64(std::uint32_t data, std::uint64_t seed) {
+    std::uint32_t u = data;
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);
+}
+inline std::uint64_t hash64(std::int64_t data, std::uint64_t seed) {
+    std::uint64_t u = static_cast<std::uint64_t>(data);
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);
+}
+inline std::uint64_t hash64(std::uint64_t data, std::uint64_t seed) {
+    std::uint64_t u = data;
+    std::uint8_t buf[sizeof(u)];
+    for (std::size_t i = 0; i < sizeof(u); ++i) {
+        buf[i] = static_cast<std::uint8_t>((u >> (8 * i)) & 0xFF);
+    }
+    return murmur_original(buf, sizeof(buf), seed);

Conversation

Erixonich commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Review Checklist

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Erixonich Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Erixonich Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Erixonich Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Erixonich commented Mar 10, 2026 •

edited

Loading