Skip to content
/ server Public

MDEV-38180: Add xxh32() and xxh3() SQL functions#4741

Open
monthdev wants to merge 6 commits intoMariaDB:mainfrom
monthdev:MDEV-38180
Open

MDEV-38180: Add xxh32() and xxh3() SQL functions#4741
monthdev wants to merge 6 commits intoMariaDB:mainfrom
monthdev:MDEV-38180

Conversation

@monthdev
Copy link

@monthdev monthdev commented Mar 6, 2026

Address MDEV-38180.

There is already an open PR for this here #4500. I am just assuming it was dropped and reiterating with a narrower scope.

@CLAassistant
Copy link

CLAassistant commented Mar 6, 2026

CLA assistant check
All committers have signed the CLA.

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 6, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review. Thank you for your contribution.

I think you need to figure out what is the fate of the other PR. As yours seems to mostly be a fork of it (I didn't compare, but it looks very similar). I will add Daniel as a reviewer to this one so he can work on that.

Should we decide to go ahead, please make sure your commit message must be present and must follow CODING_STANDARDS.md

@gkodinov gkodinov requested a review from grooverdan March 6, 2026 10:25
@monthdev monthdev changed the title MDEV-38180: Add XXH32() and XXH3() SQL functions MDEV-38180: Add xxh32(), xxh3(), and xxh3_128() SQL functions Mar 7, 2026
@monthdev
Copy link
Author

monthdev commented Mar 7, 2026

@gkodinov I appreciate the review! In my latest commit, I

  • (re)implemented xxh32(), xxh3(), and xxh3_128()
  • changed SQL API to return hex string
  • normalized input to stable charset before hashing
  • added MTR coverage for xxh3_128() and charset consistency
  • cleaned up coding style.

Please let me know of any other problems or nits!

@monthdev monthdev requested a review from gkodinov March 7, 2026 05:46
@monthdev
Copy link
Author

monthdev commented Mar 7, 2026

Hi, again! I looked at the failing checks, and they look unrelated to my PR to the extent that the tests do not fail on my new test file func_xxh.test/result.

Copy link
Contributor

@mariadb-YuchenPei mariadb-YuchenPei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the my_hasher_st interface which is used by partitioning. It is collation-aware and the sql functions should be consistent with which partitions the string goes to.

Please add tests regarding collation equality too. A simple one could be something like checking that xxh3(' ') = xxh3(' '), given that the two strings are equal (and go to the same partition with PARTITION BY [LINEAR] KEY):

MariaDB [test]> select '' = '   ';
+------------+
| '' = '   ' |
+------------+
|          1 |
+------------+
1 row in set (0.001 sec)

P.S. It's not clear to me the context of choosing between proceeding with this PR vs #4500 given the latter is still open though appearing dormant. The review comment is independent of that.

@monthdev
Copy link
Author

monthdev commented Mar 10, 2026

@mariadb-YuchenPei Thanks for the input! It was my mistake opening this PR. I explained that on Zulip #general > MDEV-38180 GSoC 2026 @ 💬#general > MDEV-38180 GSoC 2026 @ 💬. But then I got a code review from another maintainer, so I got confused. Please disregard my PR in favor of #4500 which was opened first! I am not trying to take someone else's PR.

@mariadb-YuchenPei
Copy link
Contributor

@monthdev, I spoke to @grooverdan. We were going to wait for the #4500 contributor to update their work based on some current input. They haven't done so and the patch in its current form cannot be merged. We appreciate your courtesy in giving the opportunity for the #4500 author to complete their work. I think a reasonable amount of time has passed, and if you are willing to address and complete this PR, then ultimately the users of MariaDB will benefit from your labour.

@monthdev
Copy link
Author

@mariadb-YuchenPei Will do! I opened another PR in the meantime #4767 (again assuming #4500), so I'll complete both as soon as I can.

@monthdev
Copy link
Author

@mariadb-YuchenPei Made changes addressing #4741 (review)

@monthdev

This comment was marked as resolved.

@monthdev
Copy link
Author

Or maybe I could open a separate reformat PR for this registry array?

@monthdev monthdev changed the title MDEV-38180: Add xxh32(), xxh3(), and xxh3_128() SQL functions MDEV-38180: Add xxh32() and xxh3() SQL functions Mar 12, 2026
@monthdev
Copy link
Author

@mariadb-YuchenPei I scrapped xxh3_128() because it wasn't a part of the hasher interface. Maybe that could be a later feature?

@grooverdan
Copy link
Member

Or maybe I could open a separate reformat PR for this registry array?

reformatting makes merging hard, manual and error prone. While format's have diverged, its best to constrain any reformatting to the very narrow scope of what is actually being changed.

Copy link
Contributor

@mariadb-YuchenPei mariadb-YuchenPei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick comment. Still looking.

uint errors= 0;
converted_value->length(0);
if (converted_value->copy(input->ptr(), input->length(), input->charset(),
xxh_charset, &errors))
Copy link
Contributor

@mariadb-YuchenPei mariadb-YuchenPei Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are strings all converted to my_charset_utf8mb4_general_ci before being hashed? They shouldn't be.

If one applies this diff

modified   mysql-test/main/partition_key_algorithm.test
@@ -136,6 +136,7 @@ DROP TABLE t1;
 CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET koi8u COLLATE koi8u_general_ci);
 INSERT INTO t1 VALUES (0x20),(0x60),(0x6060),(0x606060);
 SELECT HEX(a) FROM t1 WHERE a=0x60;
+select xxh3(a) from t1;
 eval
 ALTER TABLE t1  PARTITION BY KEY $algo (a) PARTITIONS 3;
 SELECT HEX(a) FROM t1 WHERE a=0x60;

and run the test, we have

--- /home/ycp/source/mariadb-server/main/src/mysql-test/main/partition_key_algorithm.result	2026-03-13 16:20:20.733682895 +1100
+++ /home/ycp/source/mariadb-server/main/src/mysql-test/main/partition_key_algorithm.reject	2026-03-13 16:45:41.878718485 +1100
@@ -193,6 +193,12 @@
 60
 6060
 606060
+select xxh3(a) from t1;
+xxh3(a)
+0000000000000000
+dfce7d3ab64b81ae
+684d238e39c1c15c
+28e8eabcf16c906a
 ALTER TABLE t1  PARTITION BY KEY  (a) PARTITIONS 3;
 SELECT HEX(a) FROM t1 WHERE a=0x60;
 HEX(a)

But all four values should be the same because they are equal under "CHARACTER SET koi8u COLLATE koi8u_general_ci" (see the previous statement in the test).

@monthdev
Copy link
Author

@mariadb-YuchenPei Let me know if my change is how you wanted it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants