[Enhancement](udf) Support volatility property for scalar UDF by linrrzqqq · Pull Request #62698 · apache/doris

linrrzqqq · 2026-04-22T06:53:29Z

Problem Summary:

Previously, UDFs could be treated as deterministic in optimizer-related paths, which is unsafe for UDFs whose results are not stable across evaluations. That may cause invalid rewrite/planning decisions and lead to incorrect query semantics in some cases.

Introduce immutable, stable, and volatile semantics through "volatility" = "immutable|stable|volatile", persist the property in function metadata, and use it to drive deterministic and volatile-expression behavior in Nereids.

Immutable UDFs are treated as deterministic, stable UDFs avoid volatile identity handling while remaining non-deterministic, and volatile UDFs receive per-call volatile identities to protect unsafe rewrites.

CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
INSERT INTO cte_uuid_seed VALUES (1),(2),(3);

DROP FUNCTION IF EXISTS py_uuid_token(INT);
CREATE FUNCTION py_uuid_token(INT)
RETURNS STRING
PROPERTIES (
    "type" = "PYTHON_UDF",
    "symbol" = "py_uuid_token_impl",
    "always_nullable" = "false",
    "runtime_version" = "3.12.11"
)
AS $$
import uuid
def py_uuid_token_impl(x):
    return f"{x}-{uuid.uuid4()}"
$$;

before:

SET enable_cte_materialize = true;
SET inline_cte_referenced_threshold = 10;

-- treated as volatile func(UniqueFunction), which caused wrong planning
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
GROUP BY id ORDER BY id;
+------+-----------------+
| id   | distinct_tokens |
+------+-----------------+
|    1 |               2 |
|    2 |               2 |
|    3 |               2 |
+------+-----------------+

now

+------+-----------------+
| id   | distinct_tokens |
+------+-----------------+
|    1 |               1 |
|    2 |               1 |
|    3 |               1 |
+------+-----------------+

doc: apache/doris-website#3570

Thearas · 2026-04-22T06:53:35Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

linrrzqqq · 2026-04-23T09:06:22Z

run buildall

linrrzqqq · 2026-04-23T09:07:56Z

/review

github-actions · 2026-04-23T09:12:44Z

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24826801894

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

hello-stephen · 2026-04-23T10:36:27Z

FE UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-23T11:13:35Z

FE Regression Coverage Report

Increment line coverage 100.00% (6/6) 🎉
Increment coverage report
Complete coverage report

linrrzqqq · 2026-04-23T11:27:11Z

run buildall

linrrzqqq · 2026-04-23T11:27:19Z

/review

github-actions

I found blocking correctness issues.

Blanket isDeterministic() == false for all Java/Python UDF/UDAF/UDTF classes is too broad. The modified regression coverage uses deterministic helpers (IntTest, MySumInt, FloatTest, float_test.py), so this change now rejects pure external UDFs from MV/MTMV unless users opt into enable_nondeterministic_function=true and stops existing MV rewrite coverage from applying.
The optimizer fix is incomplete. Several rewrite paths that can duplicate or relocate expressions still key off containsUniqueFunction() rather than !isDeterministic(), so external UDFs can still be evaluated multiple times or moved incorrectly. A simple filter(project(udf(...) as a)) case is still vulnerable.
The bundled BE fix correctly handles unaligned Decimal256 Arrow reads, but the adjacent TYPE_DECIMALV2 Arrow branch still performs the same unsafe reinterpret_cast load and keeps the misalignment UB for DECIMALV2.

Critical checkpoints:

Goal / correctness: The PR fixes the specific CTE nondeterminism example, but not the end-to-end optimizer correctness problem and it broadens semantics for all external UDFs.
Scope / minimality: Not minimal; it changes default behavior for every Java/Python UDF/UDAF/UDTF and rewrites existing deterministic-UDF MV tests around that change.
Concurrency / lifecycle: No new concurrency or lifecycle issue found in the touched code.
Config / compatibility: No new config or compatibility issue found.
Parallel paths: Not all relevant rewrite paths were updated.
Tests: Added CTE/MTMV coverage is useful, but the modified float/java tests now encode the broader regression for deterministic UDFs and there is no regression case for the remaining filter/project duplication path.
Observability / transaction / FE-BE variable passing: Not applicable in this diff.
Performance: Blanket nondeterminism disables valid optimizations for pure UDFs.

No additional user-provided focus was supplied.

linrrzqqq · 2026-04-23T13:15:55Z

run buildall

linrrzqqq · 2026-04-23T13:16:01Z

/review

hello-stephen · 2026-04-23T13:27:05Z

FE Regression Coverage Report

Increment line coverage 100.00% (6/6) 🎉
Increment coverage report
Complete coverage report

github-actions

Additional blockers beyond the existing inline threads:

Pre-upgrade UDF metadata is not backward-compatible with the new persisted deterministic field. Old Java/Python UDFs/UDAFs/UDTFs replay as nondeterministic after upgrade, which silently changes SHOW CREATE FUNCTION output and can strand existing MTMVs in SCHEMA_CHANGE when ensureMTMVQueryUsable() re-analyzes the stored query.
The new per-UDF deterministic property is still not honored in cache paths. Deterministic external UDFs remain blanket-ineligible for SQL cache and fragment query cache because those paths still special-case Udf/AliasFunction instead of consulting isDeterministic().

Critical checkpoints:

Goal: Partially met. The PR plumbs a user-facing determinism flag through create/show/Nereids translation and adds regression coverage for CTE/MTMV scenarios, but it does not preserve compatibility for existing persisted UDFs and it does not propagate the flag to cache consumers.
Scope/focus: The code changes are localized, but the end-to-end contract of the new property is incomplete.
Concurrency: No new lock-order or thread-safety issue found in the touched code; the blocking problems are replay/analysis-path correctness issues.
Lifecycle/static init: No special lifecycle or static initialization issue found.
Config/properties: A new user-visible UDF property (deterministic) is added and parsed, but not all relevant consumers honor it.
Compatibility/persistence: Blocking issue. Function JSON persistence changed without a backward-compatible upgrade path for existing function metadata.
Parallel code paths: Blocking issue. Cache/query-normalization paths still bypass the new flag.
Special conditional checks: No additional issue beyond the above.
Test coverage: Added DDL/CTE/MV coverage is useful, but there is no coverage for upgrade/replay compatibility or SQL/query cache behavior.
Test result files: The new .out files look consistent with the added regression cases.
Observability: No new observability gap found for this change.
Transaction/persistence/data writes: Metadata persistence is affected; the replay compatibility issue above is the main blocker.
FE-BE variable passing: Not applicable here.
Performance: No material performance regression found in the touched code.
Other issues: No additional distinct blocker beyond the two comments here and the already-open review threads.

User focus: none provided.

hello-stephen · 2026-04-23T15:26:20Z

FE Regression Coverage Report

Increment line coverage 87.76% (43/49) 🎉
Increment coverage report
Complete coverage report

linrrzqqq · 2026-04-24T07:25:39Z

run buildall

hello-stephen · 2026-04-24T08:57:03Z

FE UT Coverage Report

Increment line coverage 20.41% (10/49) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-24T09:37:36Z

FE Regression Coverage Report

Increment line coverage 66.15% (43/65) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-24T11:37:38Z

FE Regression Coverage Report

Increment line coverage 42.57% (43/101) 🎉
Increment coverage report
Complete coverage report

HappenLee · 2026-05-07T13:08:04Z

/review

github-actions

I am requesting changes based on the already-open inline review threads; I did not add duplicate inline comments for the same issues.

Critical checkpoint conclusions:

Goal/test coverage: The PR aims to add an explicit deterministic property for Java/Python UDF, UDAF, and UDTF handling. The property is plumbed through catalog/Nereids classes and new tests exist, but existing review threads identify unresolved end-to-end gaps in cache/planner behavior and optimizer rewrites.
Scope/focus: The code changes are mostly localized, but the behavior change is broad because UDF determinism affects MV/MTMV eligibility, SQL cache, fragment query cache, CTE inlining, and optimizer rewrites.
Concurrency/lifecycle: I did not find new lock ordering, shared mutable state, or static initialization risks in this diff.
Configuration: No new config item is added; this is a persisted function property.
Compatibility/persistence: Existing review context already flags that old persisted functions lack the new field and may be replayed with a changed default, which is a blocking compatibility concern.
Parallel paths: Java/Python scalar, aggregate, and table-function wrappers are touched, but existing review context already flags remaining parallel paths that still do not respect the new determinism bit.
Conditional checks/error handling: Boolean property parsing follows existing property parsing style; I did not find an additional distinct error-handling issue.
Tests/results: Regression and FE unit coverage were added, with generated .out files present. Coverage still does not close the existing end-to-end gaps noted in the inline threads.
Observability/transaction/data-write/FE-BE protocol: No new observability need, data write path, or FE-BE protocol field was identified beyond persisted function metadata.
Performance: I did not find an additional distinct performance regression in the changed code.

User focus: No additional user-provided review focus was specified. I reviewed the full PR with extra attention to determinism propagation and did not find a new non-duplicate issue beyond the existing review threads.

yujun777 · 2026-05-08T03:49:15Z

I think a boolean deterministic property may still be too coarse here. For UDFs, it may be better to consider the same three categories as function volatility:

immutable: same input always returns the same output, e.g. def f(x): return x + 1 or lower/abs-like pure computation.
stable: stable within one statement/query but may change across statements, e.g. a UDF returning the query start time, session/database context, or a statement-level config value.
volatile: each invocation may return a different result or the call count/location has semantics, e.g. uuid.uuid4(), random.random(), HTTP/RPC calls, or UDFs with side effects.

So for Python UDF we probably need to identify which of these cases we want to support. The current deterministic true/false split can distinguish immutable from non-immutable, but it cannot distinguish stable functions from volatile functions, while optimizer rules need different behavior for those two cases.

linrrzqqq · 2026-05-12T08:12:02Z

/review

github-actions

Review completed. I am not adding duplicate inline comments because the remaining blocking concern is already covered by the existing thread on CreateFunctionCommand: the new UDF volatility flag is still not effective end-to-end for SQL cache / fragment query cache eligibility. In the current head, ExpressionAnalyzer still marks any Udf/AliasUdfBuilder as cannotProcessExpression, and PhysicalPlanTranslator.hasUndeterministicExpression() still returns true for any Udf.class/AliasFunction.class before relying on isDeterministic(), so a scalar UDF declared immutable/volatility=immutable is still treated as uncacheable in those paths.

Critical checkpoint conclusions:

Goal/test: The PR aims to let users classify scalar Java/Python UDF volatility and protect optimizer rewrites for volatile calls. The current tests cover construction/equality and some rewrite behavior, but they do not prove the cache-related advertised deterministic behavior end-to-end.
Scope/focus: The implementation is mostly focused on Nereids UDF volatility plumbing, though several end-to-end paths still need alignment.
Concurrency/lifecycle: No new concurrent mutable state or non-obvious lifecycle issue found in the reviewed FE changes.
Configuration/compatibility: No new config item. Persisted Function compatibility now has a null/default guard for old metadata.
Parallel paths: One important parallel path remains incomplete: SQL cache and fragment query cache still use UDF-class checks that bypass the new volatility semantics.
Special conditions: The VOLATILE identity logic is localized and guarded; no additional distinct issue found there.
Test coverage: Missing cache eligibility tests for immutable UDFs and negative tests for volatile UDFs through the cache decision paths.
Observability/transactions/data writes: Not applicable for this FE optimizer/catalog metadata change beyond normal function persistence.
FE/BE variable passing: Runtime execution does not appear to require BE volatility propagation; this is FE planning metadata.

User focus: No additional user-provided review focus was specified.

linrrzqqq · 2026-05-12T08:31:15Z

run buildall

linrrzqqq · 2026-05-14T17:05:03Z

run buildall

hello-stephen · 2026-05-14T17:35:23Z

TPC-H: Total hot run time: 29197 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit acf1ca7942d68b7d197b23bf967fa64eaa314b7f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17832	3913	3864	3864
q2	q3	10748	868	601	601
q4	4659	447	345	345
q5	7450	1322	1144	1144
q6	215	164	135	135
q7	916	931	752	752
q8	9706	1429	1218	1218
q9	6143	5363	5279	5279
q10	6296	2084	1796	1796
q11	472	265	255	255
q12	693	418	298	298
q13	18156	3332	2684	2684
q14	298	283	260	260
q15	q16	890	866	780	780
q17	953	1070	728	728
q18	6436	5563	5556	5556
q19	1193	1271	1077	1077
q20	510	397	260	260
q21	4601	2277	1861	1861
q22	422	353	304	304
Total cold run time: 98589 ms
Total hot run time: 29197 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4109	4044	4056	4044
q2	q3	4626	4746	4161	4161
q4	2087	2161	1349	1349
q5	4893	4943	5263	4943
q6	185	158	128	128
q7	2005	1786	1820	1786
q8	3463	3221	3183	3183
q9	8504	8432	8404	8404
q10	4474	4492	4219	4219
q11	587	435	407	407
q12	705	764	512	512
q13	3123	3617	2950	2950
q14	300	302	272	272
q15	q16	755	764	709	709
q17	1348	1318	1216	1216
q18	8107	7134	7028	7028
q19	1172	1133	1144	1133
q20	2219	2246	1958	1958
q21	6077	5464	4769	4769
q22	544	492	415	415
Total cold run time: 59283 ms
Total hot run time: 53586 ms

hello-stephen · 2026-05-14T17:46:21Z

TPC-DS: Total hot run time: 170621 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit acf1ca7942d68b7d197b23bf967fa64eaa314b7f, data reload: false

query5	4345	663	505	505
query6	341	218	194	194
query7	4215	569	298	298
query8	326	233	231	231
query9	8846	4086	4001	4001
query10	482	355	298	298
query11	6040	2379	2213	2213
query12	193	135	127	127
query13	1277	619	475	475
query14	6817	5350	5035	5035
query14_1	4346	4352	4313	4313
query15	210	207	183	183
query16	1032	467	355	355
query17	1323	767	646	646
query18	2776	496	354	354
query19	327	204	178	178
query20	140	132	133	132
query21	232	137	131	131
query22	13632	13451	13435	13435
query23	17119	16375	16597	16375
query23_1	16184	16281	16262	16262
query24	8180	1838	1439	1439
query24_1	1461	1427	1393	1393
query25	591	524	490	490
query26	1425	324	173	173
query27	2956	603	360	360
query28	4497	1969	2005	1969
query29	982	668	495	495
query30	296	234	189	189
query31	1107	1061	936	936
query32	86	68	69	68
query33	527	341	289	289
query34	1143	1151	649	649
query35	752	779	658	658
query36	1290	1347	1137	1137
query37	149	102	86	86
query38	3171	3122	3055	3055
query39	923	912	886	886
query39_1	877	866	887	866
query40	236	154	132	132
query41	64	60	58	58
query42	107	108	108	108
query43	321	319	278	278
query44	
query45	205	203	189	189
query46	1068	1166	694	694
query47	2292	2278	2231	2231
query48	396	418	311	311
query49	657	543	444	444
query50	745	295	222	222
query51	4312	4315	4262	4262
query52	103	106	93	93
query53	257	276	207	207
query54	304	273	262	262
query55	96	93	86	86
query56	311	319	306	306
query57	1452	1471	1349	1349
query58	332	273	264	264
query59	1551	1578	1378	1378
query60	328	321	314	314
query61	150	153	146	146
query62	670	616	555	555
query63	246	206	210	206
query64	2348	871	672	672
query65	
query66	1670	511	396	396
query67	30045	29969	29716	29716
query68	
query69	487	319	300	300
query70	987	999	931	931
query71	305	281	262	262
query72	3000	2693	2349	2349
query73	843	750	425	425
query74	5051	4898	4737	4737
query75	2767	2663	2329	2329
query76	2283	1105	759	759
query77	418	436	350	350
query78	12851	12935	12278	12278
query79	1531	1033	756	756
query80	1410	561	510	510
query81	522	276	247	247
query82	968	157	121	121
query83	356	269	240	240
query84	270	142	109	109
query85	921	536	440	440
query86	440	328	325	325
query87	3397	3332	3217	3217
query88	3558	2692	2658	2658
query89	436	383	333	333
query90	1947	179	174	174
query91	175	167	138	138
query92	74	75	74	74
query93	1083	965	561	561
query94	720	326	284	284
query95	678	459	349	349
query96	1031	739	329	329
query97	2682	2681	2561	2561
query98	236	229	226	226
query99	1081	1126	996	996
Total cold run time: 256309 ms
Total hot run time: 170621 ms

hello-stephen · 2026-05-14T19:36:26Z

FE Regression Coverage Report

Increment line coverage 70.69% (123/174) 🎉
Increment coverage report
Complete coverage report

yujun777

LGTM

hello-stephen · 2026-05-15T08:50:04Z

FE Regression Coverage Report

Increment line coverage 52.56% (123/234) 🎉
Increment coverage report
Complete coverage report

linrrzqqq · 2026-05-17T13:31:20Z

run buildall

hello-stephen · 2026-05-17T13:58:54Z

TPC-H: Total hot run time: 30926 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ec07f170236838475ad9da6a010e62e610ab33d7, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17672	3838	3827	3827
q2	q3	10767	1350	810	810
q4	4685	471	349	349
q5	7582	2216	2079	2079
q6	308	174	138	138
q7	995	787	645	645
q8	9394	1766	1679	1679
q9	6999	4919	4934	4919
q10	6479	2059	1783	1783
q11	436	279	239	239
q12	691	424	289	289
q13	18202	3706	2798	2798
q14	265	258	231	231
q15	q16	822	781	718	718
q17	984	949	992	949
q18	6697	5768	5420	5420
q19	1254	1288	1117	1117
q20	506	418	263	263
q21	5614	2572	2374	2374
q22	435	355	299	299
Total cold run time: 100787 ms
Total hot run time: 30926 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4187	4117	4104	4104
q2	q3	4504	4889	4303	4303
q4	2103	2200	1397	1397
q5	4387	4262	4281	4262
q6	226	175	129	129
q7	2276	1861	1606	1606
q8	2461	2130	2037	2037
q9	7755	7648	7713	7648
q10	4587	4455	4006	4006
q11	570	398	383	383
q12	914	760	532	532
q13	3256	3628	3009	3009
q14	301	305	266	266
q15	q16	714	737	635	635
q17	1325	1294	1291	1291
q18	8068	7384	7062	7062
q19	1170	1071	1117	1071
q20	2194	2205	1930	1930
q21	5287	4549	4456	4456
q22	522	456	405	405
Total cold run time: 56807 ms
Total hot run time: 50532 ms

hello-stephen · 2026-05-17T14:09:46Z

TPC-DS: Total hot run time: 169275 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ec07f170236838475ad9da6a010e62e610ab33d7, data reload: false

query5	4342	635	514	514
query6	346	230	212	212
query7	4251	594	298	298
query8	338	236	228	228
query9	8822	4043	3995	3995
query10	458	354	299	299
query11	5821	2363	2244	2244
query12	185	132	128	128
query13	1309	634	453	453
query14	6028	5359	5057	5057
query14_1	4393	4348	4366	4348
query15	217	204	184	184
query16	1036	481	450	450
query17	1162	730	586	586
query18	2561	486	344	344
query19	241	200	162	162
query20	140	136	134	134
query21	211	133	123	123
query22	13706	13497	13255	13255
query23	17358	16347	16063	16063
query23_1	16146	16086	16109	16086
query24	7682	1744	1293	1293
query24_1	1319	1322	1289	1289
query25	535	471	424	424
query26	1324	320	171	171
query27	2738	542	333	333
query28	4405	1954	1933	1933
query29	971	626	496	496
query30	305	238	200	200
query31	1129	1065	940	940
query32	86	77	72	72
query33	539	359	291	291
query34	1147	1154	634	634
query35	764	795	668	668
query36	1344	1387	1192	1192
query37	153	103	94	94
query38	3195	3143	3068	3068
query39	927	935	906	906
query39_1	872	892	885	885
query40	236	144	158	144
query41	67	65	61	61
query42	112	110	106	106
query43	319	324	284	284
query44	
query45	203	203	191	191
query46	1052	1193	702	702
query47	2349	2409	2190	2190
query48	400	422	292	292
query49	637	493	394	394
query50	1044	351	260	260
query51	4366	4342	4212	4212
query52	104	104	94	94
query53	249	295	215	215
query54	321	272	249	249
query55	92	89	87	87
query56	302	306	314	306
query57	1521	1373	1287	1287
query58	292	274	266	266
query59	1553	1634	1487	1487
query60	330	343	304	304
query61	159	153	155	153
query62	680	645	569	569
query63	250	199	206	199
query64	2369	785	624	624
query65	
query66	1703	473	394	394
query67	29972	29923	29711	29711
query68	
query69	478	331	302	302
query70	1014	1023	995	995
query71	304	275	275	275
query72	2922	2716	2374	2374
query73	839	776	422	422
query74	5054	4911	4726	4726
query75	2670	2584	2263	2263
query76	2319	1122	779	779
query77	388	416	341	341
query78	12051	12056	11569	11569
query79	1421	1047	700	700
query80	642	534	502	502
query81	454	281	245	245
query82	1384	159	125	125
query83	357	272	242	242
query84	265	136	114	114
query85	891	528	451	451
query86	393	314	308	308
query87	3403	3411	3233	3233
query88	3503	2647	2645	2645
query89	434	387	346	346
query90	1921	183	181	181
query91	182	175	141	141
query92	88	75	74	74
query93	1468	1407	829	829
query94	518	351	333	333
query95	666	393	361	361
query96	1014	835	351	351
query97	2702	2700	2570	2570
query98	234	229	239	229
query99	1114	1101	980	980
Total cold run time: 253333 ms
Total hot run time: 169275 ms

hello-stephen · 2026-05-17T16:21:16Z

FE Regression Coverage Report

Increment line coverage 68.65% (127/185) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-18T04:35:44Z

FE Regression Coverage Report

Increment line coverage 68.28% (127/186) 🎉
Increment coverage report
Complete coverage report

no need

morrySnow · 2026-05-18T08:08:35Z

/review

github-actions

I reviewed the GitHub PR diff and did not find an additional distinct issue that is not already covered by the existing inline review threads. The previously raised UDF volatility/replay/cache compatibility threads remain the relevant blockers to resolve before approval.\n\nCheckpoint conclusions:\n- Goal/test: the PR adds UDF volatility metadata and optimizer safeguards with unit/regression coverage; existing threads cover remaining replay and end-to-end effectiveness gaps.\n- Scope/focus: the current diff is focused on FE UDF volatility and affected rewrites/tests.\n- Concurrency/lifecycle/config: no new concurrency, lifecycle, or dynamic-config issue found in the current GitHub diff.\n- Compatibility/persistence: existing inline threads already cover persistence/replay compatibility concerns; I did not duplicate them.\n- Parallel paths: reviewed changed Nereids rewrite guards and UDF builder paths; no additional distinct missed path found beyond existing comments.\n- Tests/results: added tests cover volatility parsing, SHOW CREATE replay, and selected rewrite behavior, with prior comments already covering failing/missing cases.\n- Observability/performance/data correctness: no additional distinct issue found in the current diff.\n- User focus: no additional user-provided review focus was specified.

github-actions · 2026-05-19T03:23:13Z

PR approved by at least one committer and no changes requested.

…#62698) Problem Summary: Previously, UDFs could be treated as deterministic in optimizer-related paths, which is unsafe for UDFs whose results are not stable across evaluations. That may cause invalid rewrite/planning decisions and lead to incorrect query semantics in some cases. Introduce `immutable`, `stable`, and `volatile` semantics through `"volatility" = "immutable|stable|volatile"`, persist the property in function metadata, and use it to drive deterministic and volatile-expression behavior in Nereids. Immutable UDFs are treated as deterministic, stable UDFs avoid volatile identity handling while remaining non-deterministic, and volatile UDFs receive per-call volatile identities to protect unsafe rewrites. ```sql CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1"); INSERT INTO cte_uuid_seed VALUES (1),(2),(3); DROP FUNCTION IF EXISTS py_uuid_token(INT); CREATE FUNCTION py_uuid_token(INT) RETURNS STRING PROPERTIES ( "type" = "PYTHON_UDF", "symbol" = "py_uuid_token_impl", "always_nullable" = "false", "runtime_version" = "3.12.11" ) AS $$ import uuid def py_uuid_token_impl(x): return f"{x}-{uuid.uuid4()}" $$; ``` before: ```sql SET enable_cte_materialize = true; SET inline_cte_referenced_threshold = 10; -- treated as volatile func(UniqueFunction), which caused wrong planning WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed) SELECT id, COUNT(DISTINCT token) AS distinct_tokens FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u GROUP BY id ORDER BY id; +------+-----------------+ | id | distinct_tokens | +------+-----------------+ | 1 | 2 | | 2 | 2 | | 3 | 2 | +------+-----------------+ ``` now ```sql +------+-----------------+ | id | distinct_tokens | +------+-----------------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | +------+-----------------+ ``` doc: apache/doris-website#3570

Related PR: #62698 Problem Summary: PR #62698 introduced the UDF volatility property and added `VolatileExpression` / `VolatileIdentity` so volatile UDF calls can carry per-call identity and avoid unsafe optimizer rewrites. This PR is a follow-up refactoring for that change. It removes duplicated unique identity state from `UniqueFunction`, keeps `VolatileIdentity` as the single identity holder, moves volatile-expression helper methods into `ExpressionTrait`, and expands the former `AddProjectForUniqueFunction` rewrite to operate on volatile expressions rather than only unique functions.

linrrzqqq force-pushed the udf-nondeterministic branch from f2edf43 to 3678922 Compare April 23, 2026 11:25

github-actions Bot requested changes Apr 23, 2026

View reviewed changes

linrrzqqq force-pushed the udf-nondeterministic branch from 3678922 to f66187b Compare April 23, 2026 13:15

github-actions Bot requested changes Apr 23, 2026

View reviewed changes

Comment thread fe/fe-catalog/src/main/java/org/apache/doris/catalog/Function.java Outdated

Comment thread ...-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/CreateFunctionCommand.java Outdated

linrrzqqq force-pushed the udf-nondeterministic branch from f66187b to 9d73507 Compare April 24, 2026 02:22

linrrzqqq changed the title ~~[Fix](udf) mark udf nondeterministic~~ [Enhancement](udf) support deterministic property for udf Apr 24, 2026

github-actions Bot requested changes May 7, 2026

View reviewed changes

[Enhancement](udf) support volatility for scalar UDFs

2bbd937

linrrzqqq force-pushed the udf-nondeterministic branch from 9d73507 to 2bbd937 Compare May 12, 2026 08:06

github-actions Bot requested changes May 12, 2026

View reviewed changes

yujun777 approved these changes May 15, 2026

View reviewed changes

enhance SHOW CREATE FUNCTION field

ec07f17

github-actions Bot reviewed May 18, 2026

View reviewed changes

hello-stephen approved these changes May 19, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 19, 2026

hello-stephen merged commit cfe94e1 into apache:master May 19, 2026
34 checks passed

linrrzqqq deleted the udf-nondeterministic branch May 19, 2026 03:50

yujun777 mentioned this pull request May 19, 2026

[refactor](fe) Refine volatile expression handling #63403

Merged

16 tasks

morrySnow added dev/4.0.x dev/4.1.x labels May 20, 2026

github-actions Bot added dev/4.1.x-conflict dev/4.0.x-conflict labels May 20, 2026

Conversation

linrrzqqq commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thearas commented Apr 22, 2026

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

hello-stephen commented Apr 23, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Apr 23, 2026

FE Regression Coverage Report

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

linrrzqqq commented Apr 23, 2026

Uh oh!

hello-stephen commented Apr 23, 2026

FE Regression Coverage Report

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hello-stephen commented Apr 23, 2026

FE Regression Coverage Report

Uh oh!

linrrzqqq commented Apr 24, 2026

Uh oh!

hello-stephen commented Apr 24, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Apr 24, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Apr 24, 2026

FE Regression Coverage Report

Uh oh!

HappenLee commented May 7, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

yujun777 commented May 8, 2026

Uh oh!

linrrzqqq commented May 12, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

linrrzqqq commented May 12, 2026

Uh oh!

linrrzqqq commented May 14, 2026

Uh oh!

hello-stephen commented May 14, 2026

Uh oh!

hello-stephen commented May 14, 2026

Uh oh!

hello-stephen commented May 14, 2026

FE Regression Coverage Report

Uh oh!

yujun777 left a comment

Choose a reason for hiding this comment

Uh oh!

linrrzqqq commented Apr 22, 2026 •

edited

Loading