Skip to content

brpc dns cache#63546

Open
zhangstar333 wants to merge 1 commit into
apache:masterfrom
zhangstar333:brpc_dns_cache
Open

brpc dns cache#63546
zhangstar333 wants to merge 1 commit into
apache:masterfrom
zhangstar333:brpc_dns_cache

Conversation

@zhangstar333
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhangstar333
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue: the BE test mock override was not updated after changing ExchangeSinkBuffer::_failed to take Status, so BE test builds that include exchange_sink_test.h will fail to compile.

Critical checkpoint conclusions:

  • Goal and correctness: The PR aims to refresh DNS/cache state on BRPC network faults and validate rebuilt cached clients with handshake. The core runtime path is directionally consistent, but the change is not build-correct for BE tests.
  • Scope: The runtime changes are mostly focused on BRPC client/DNS cache behavior, with small call-site updates for original hostnames.
  • Concurrency: DNS dirty/cache access is protected by the existing shared_mutex in the changed paths; I did not find a new lock-order issue in the reviewed diff.
  • Lifecycle/static initialization: No new cross-TU static initializer dependency was found.
  • Configuration: Two mutable configs are added; dynamic reads are used at client lookup time. The config default enables the new handshake path, so this needs successful build/test coverage before merge.
  • Compatibility: No persisted format or FE/BE thrift schema change was introduced. The handshake RPC already exists on the affected stubs.
  • Parallel paths: Direct no-cache BRPC call sites touched by the PR pass the original hostname where available; cached get_client paths are centralized.
  • Tests: No test evidence is provided in the PR description, and the existing BE exchange test mock is currently broken by the signature change.
  • Observability: The changed paths log DNS invalidation, network faults, and handshake failures; no additional metrics appear strictly required for this patch.
  • Transactions/data writes: No direct transaction or storage visibility changes were introduced.
  • Performance: The new default synchronous handshake on cache misses can add latency, but I did not identify a separate blocking correctness issue beyond the compile break.

User focus points: no additional user-provided review focus was present.

#else
virtual void _ended(RpcInstance& ins);
virtual void _failed(InstanceLoId id, const std::string& err);
virtual void _failed(InstanceLoId id, Status err);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature change breaks the BE test mock override. be/test/exec/exchange/exchange_sink_test.h still declares void _failed(InstanceLoId id, const std::string& err) override, so when tests are built with BE_TEST the override no longer matches ExchangeSinkBuffer::_failed(InstanceLoId, Status) and compilation will fail. Please update the mock (and any other BE_TEST overrides) to take Status, or avoid changing the virtual test-only signature.

@zhangstar333
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 51caf3b5e10fd46a5721d20b6d605de4e563367a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17695	4202	4180	4180
q2	q3	10779	1434	825	825
q4	4692	482	349	349
q5	7755	2289	2139	2139
q6	390	183	140	140
q7	956	807	635	635
q8	9514	1766	1605	1605
q9	7112	5069	4980	4980
q10	6439	2264	1902	1902
q11	437	273	247	247
q12	645	432	303	303
q13	18150	3464	2741	2741
q14	270	259	243	243
q15	q16	824	780	714	714
q17	911	839	915	839
q18	6878	5807	5574	5574
q19	1177	1284	1103	1103
q20	511	407	279	279
q21	5512	2499	2503	2499
q22	441	362	311	311
Total cold run time: 101088 ms
Total hot run time: 31608 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4525	4441	4411	4411
q2	q3	4574	4991	4332	4332
q4	2177	2275	1429	1429
q5	4501	4373	5043	4373
q6	266	207	147	147
q7	2022	1900	1629	1629
q8	2706	2355	2299	2299
q9	8066	8099	8039	8039
q10	4992	4808	4344	4344
q11	608	463	416	416
q12	809	768	556	556
q13	3350	3751	3058	3058
q14	326	311	283	283
q15	q16	737	738	650	650
q17	1435	1413	1420	1413
q18	8096	7588	6931	6931
q19	1137	1090	1082	1082
q20	2223	2231	1962	1962
q21	5526	4961	4769	4769
q22	533	473	422	422
Total cold run time: 58609 ms
Total hot run time: 52545 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171687 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 51caf3b5e10fd46a5721d20b6d605de4e563367a, data reload: false

query5	4306	670	517	517
query6	349	227	197	197
query7	4216	558	315	315
query8	331	234	224	224
query9	8833	4117	4113	4113
query10	446	329	306	306
query11	5688	2747	2241	2241
query12	189	128	126	126
query13	1281	598	454	454
query14	6174	5546	5261	5261
query14_1	4576	4535	4606	4535
query15	216	208	182	182
query16	994	458	467	458
query17	1169	700	588	588
query18	2710	481	349	349
query19	208	202	158	158
query20	139	129	127	127
query21	212	142	120	120
query22	13691	13563	13313	13313
query23	17453	16592	16289	16289
query23_1	16389	16326	16319	16319
query24	7481	1796	1327	1327
query24_1	1340	1355	1334	1334
query25	569	479	413	413
query26	1320	305	175	175
query27	2724	568	355	355
query28	4390	2023	1985	1985
query29	970	617	496	496
query30	302	243	200	200
query31	1136	1097	963	963
query32	94	78	71	71
query33	528	349	286	286
query34	1172	1123	655	655
query35	770	795	686	686
query36	1421	1430	1236	1236
query37	161	102	89	89
query38	3213	3158	3093	3093
query39	920	928	899	899
query39_1	881	890	885	885
query40	222	145	123	123
query41	66	64	63	63
query42	110	108	107	107
query43	339	332	302	302
query44	
query45	230	203	204	203
query46	1077	1171	733	733
query47	2323	2436	2191	2191
query48	403	440	295	295
query49	630	494	394	394
query50	1026	359	251	251
query51	4325	4315	4284	4284
query52	103	105	96	96
query53	261	284	202	202
query54	310	269	248	248
query55	91	90	87	87
query56	290	314	296	296
query57	1447	1412	1327	1327
query58	296	269	272	269
query59	1605	1719	1411	1411
query60	337	325	306	306
query61	161	153	157	153
query62	694	651	585	585
query63	241	202	198	198
query64	2316	801	623	623
query65	
query66	1680	484	363	363
query67	30341	29348	29189	29189
query68	
query69	495	356	303	303
query70	1011	994	1000	994
query71	315	278	276	276
query72	3047	2666	2403	2403
query73	831	769	414	414
query74	5724	4951	4808	4808
query75	2685	2614	2270	2270
query76	2313	1153	772	772
query77	391	424	338	338
query78	12326	12432	11883	11883
query79	1307	1039	745	745
query80	570	577	449	449
query81	451	278	245	245
query82	234	163	124	124
query83	277	277	249	249
query84	264	139	114	114
query85	934	538	456	456
query86	350	338	319	319
query87	3428	3387	3257	3257
query88	3608	2798	2744	2744
query89	421	394	337	337
query90	2195	187	181	181
query91	176	164	143	143
query92	79	80	74	74
query93	1425	1465	853	853
query94	534	344	312	312
query95	683	394	357	357
query96	1083	816	361	361
query97	2731	2723	2641	2641
query98	239	230	227	227
query99	1160	1147	1038	1038
Total cold run time: 254682 ms
Total hot run time: 171687 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants