Skip to content

[opt](function) Eliminate redundant hash computation in AggregateFunctionUniq#61730

Open
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:hash-double-computation
Open

[opt](function) Eliminate redundant hash computation in AggregateFunctionUniq#61730
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:hash-double-computation

Conversation

@Mryange
Copy link
Contributor

@Mryange Mryange commented Mar 25, 2026

What problem does this PR solve?

Problem Summary:

In AggregateFunctionUniq::add_batch_single_place and add_batch (the hot path
for SELECT count(distinct col)), every key's hash is computed twice:

  1. set.prefetch(keys[i + HASH_MAP_PREFETCH_DIST]) internally calls
    this->hash(key) to locate the slot for CPU prefetch.
  2. set.insert(keys[i]) goes through emplaceEmplaceDecomposable
    s.hash(key) again to find or prepare the insert position.

The same codebase already has a correct "precompute hash + reuse" pattern in the
DistinctStreamingAgg operator path (hash_map_context.h), where
init_hash_values() computes hash once, and subsequent prefetch /
lazy_emplace_key calls reuse the precomputed value. This PR applies the same
optimization to AggregateFunctionUniq.

How does this PR solve it?

For both add_batch and add_batch_single_place in aggregate_function_uniq.h
and aggregate_function_uniq_distribute_key.h:

  1. Precompute hash values into a std::vector<size_t> before the main loop,
    using set.hash(keys[i]) — this is the only hash computation per key.
  2. Replace set.prefetch(key) with set.prefetch_hash(hash_values[...])
    reuses the precomputed hash, avoids recalculation and the unnecessary memory
    access to keys[i + HASH_MAP_PREFETCH_DIST] at prefetch time.
  3. Replace set.insert(key) with set.emplace_with_hash(hash_values[i], key)
    — passes the precomputed hash directly, skipping the internal hash call.

Both prefetch_hash and emplace_with_hash are existing APIs in phmap
(parallel_hashmap/phmap.h), no third-party changes needed.

Expected improvements:

  • Hash computation reduced from 2× to 1× per key
  • The precompute loop is a pure sequential scan over keys[], which is more
    cache-friendly than interleaving hash computation with hash-table probing
  • Better prefetch effectiveness: prefetch no longer needs to access
    keys[i + HASH_MAP_PREFETCH_DIST] memory just to compute its hash

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Contributor Author

Mryange commented Mar 25, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26866 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f891205105b1295f88192127128fb98402a8ac5b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17597	4557	4295	4295
q2	q3	10641	769	538	538
q4	4683	352	249	249
q5	7580	1237	1040	1040
q6	174	181	152	152
q7	773	860	683	683
q8	9556	1501	1317	1317
q9	5252	4767	4724	4724
q10	6319	1933	1688	1688
q11	480	269	244	244
q12	740	595	471	471
q13	18045	2661	1925	1925
q14	231	239	218	218
q15	q16	750	733	671	671
q17	756	870	435	435
q18	6330	5492	5312	5312
q19	1159	993	617	617
q20	532	484	377	377
q21	4572	1871	1593	1593
q22	469	363	317	317
Total cold run time: 96639 ms
Total hot run time: 26866 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4761	4561	4670	4561
q2	q3	3921	4343	3842	3842
q4	913	1211	783	783
q5	4103	4483	4352	4352
q6	182	185	145	145
q7	1754	1658	1545	1545
q8	2452	2714	2589	2589
q9	7717	7538	7442	7442
q10	3754	4036	3587	3587
q11	499	422	432	422
q12	471	582	454	454
q13	2513	3032	2107	2107
q14	293	330	296	296
q15	q16	738	763	711	711
q17	1371	1500	1436	1436
q18	7331	6876	6693	6693
q19	958	911	885	885
q20	2045	2176	1995	1995
q21	3955	3510	3377	3377
q22	443	427	375	375
Total cold run time: 50174 ms
Total hot run time: 47597 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169147 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f891205105b1295f88192127128fb98402a8ac5b, data reload: false

query5	4334	649	497	497
query6	357	233	205	205
query7	4217	471	269	269
query8	348	236	236	236
query9	8755	2727	2732	2727
query10	530	390	382	382
query11	6969	5102	4880	4880
query12	186	137	129	129
query13	1299	472	358	358
query14	5799	3729	3479	3479
query14_1	2847	2835	2850	2835
query15	211	197	175	175
query16	1014	462	474	462
query17	1119	760	644	644
query18	2455	461	362	362
query19	223	215	185	185
query20	141	131	127	127
query21	215	134	109	109
query22	13307	14194	14549	14194
query23	16730	16494	16037	16037
query23_1	16056	15663	15694	15663
query24	7181	1625	1217	1217
query24_1	1241	1227	1248	1227
query25	592	522	415	415
query26	1252	264	155	155
query27	2773	476	294	294
query28	4489	1845	1852	1845
query29	834	561	483	483
query30	293	225	191	191
query31	1004	946	878	878
query32	88	73	70	70
query33	506	338	289	289
query34	889	895	511	511
query35	638	685	580	580
query36	1080	1167	991	991
query37	137	96	81	81
query38	2950	2923	2828	2828
query39	875	841	806	806
query39_1	792	802	794	794
query40	236	156	137	137
query41	64	60	62	60
query42	266	254	255	254
query43	244	247	228	228
query44	
query45	203	190	186	186
query46	875	978	608	608
query47	2750	2119	2076	2076
query48	330	314	234	234
query49	628	464	389	389
query50	688	283	215	215
query51	4113	4078	3977	3977
query52	264	270	257	257
query53	296	334	290	290
query54	314	280	278	278
query55	94	86	87	86
query56	331	332	317	317
query57	1921	1886	1835	1835
query58	288	282	272	272
query59	2815	2960	2729	2729
query60	349	338	321	321
query61	156	146	151	146
query62	634	602	544	544
query63	311	280	273	273
query64	5087	1310	999	999
query65	
query66	1468	472	353	353
query67	24314	24219	24080	24080
query68	
query69	413	320	298	298
query70	955	974	944	944
query71	346	309	307	307
query72	2826	2720	2415	2415
query73	553	568	318	318
query74	9661	9571	9414	9414
query75	2840	2795	2453	2453
query76	2291	1038	705	705
query77	360	386	303	303
query78	11077	11213	10484	10484
query79	1110	763	570	570
query80	1339	650	534	534
query81	553	258	224	224
query82	1106	153	117	117
query83	332	259	244	244
query84	250	121	106	106
query85	903	506	457	457
query86	412	306	293	293
query87	3165	3143	3002	3002
query88	3566	2698	2682	2682
query89	434	365	341	341
query90	2009	173	170	170
query91	171	166	134	134
query92	80	74	71	71
query93	926	835	498	498
query94	631	324	308	308
query95	611	342	315	315
query96	653	530	223	223
query97	2481	2485	2404	2404
query98	244	229	224	224
query99	1026	997	930	930
Total cold run time: 251230 ms
Total hot run time: 169147 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.82% (19872/37621)
Line Coverage 36.32% (185643/511158)
Region Coverage 32.57% (143797/441560)
Branch Coverage 33.77% (62943/186403)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.73% (26425/36838)
Line Coverage 54.56% (278054/509595)
Region Coverage 51.70% (230419/445686)
Branch Coverage 53.19% (99455/186967)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants