Skip to content

refactor(cudf): Register Spark and Presto functions separately#16960

Closed
jinchengchenghh wants to merge 2 commits intofacebookincubator:mainfrom
jinchengchenghh:cudf_expression
Closed

refactor(cudf): Register Spark and Presto functions separately#16960
jinchengchenghh wants to merge 2 commits intofacebookincubator:mainfrom
jinchengchenghh:cudf_expression

Conversation

@jinchengchenghh
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh commented Mar 30, 2026

The downstream project should register their own functions separately, only register the common functions when registering cudf.
Remove config cudf.function_engine.
For functions whose implementations are shared across engines but have different signatures or names, we should introduce a CommonFunctions.h file to host the shared definitions. Currently, no such cases exist.

Cooperated with AI IBM Bob

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 30, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 30, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 860dd33
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69de3fe189eeb10008090f8c

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 30, 2026

Build Impact Analysis

Full build recommended. Files outside the dependency graph changed:

  • velox/experimental/cudf/CudfConfig.h
  • velox/experimental/cudf/exec/AggregationRegistry.cpp
  • velox/experimental/cudf/exec/AggregationRegistry.h
  • velox/experimental/cudf/exec/CMakeLists.txt
  • velox/experimental/cudf/exec/CudfHashAggregation.cpp
  • velox/experimental/cudf/exec/CudfHashAggregation.h
  • velox/experimental/cudf/exec/PrestoAggregateFunctions.cpp
  • velox/experimental/cudf/exec/PrestoAggregateFunctions.h
  • velox/experimental/cudf/exec/SparkAggregateFunctions.cpp
  • velox/experimental/cudf/exec/SparkAggregateFunctions.h
  • ... and 13 more

These directories are not fully covered by the dependency graph. A full build is the safest option.

cmake --build _build/release

Slow path • Graph generated from PR branch

@jinchengchenghh jinchengchenghh marked this pull request as draft March 30, 2026 15:40
@jinchengchenghh jinchengchenghh added the cudf cudf related - GPU acceleration label Mar 31, 2026
@jinchengchenghh jinchengchenghh marked this pull request as ready for review March 31, 2026 15:43
@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Hi, @devavret Do you have any suggestions?

Copy link
Copy Markdown
Collaborator

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanups are required. Also, a companion presto PR should be merged at the same time. prestodb/presto#27540


void unregisterFunctions();

void unregisterFunctionsWithPrefix(const std::string& prefix);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used anywhere? using the overwrite flag with registerCudfFunction should be sufficient right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, but please don't rely on overwrite flag, it is not reliable in many cases, such as decimal result with different scale, variant number of input arguments such as hash(x) and hash(x, y, z), argument datatype is any but not supported any.

.argumentType("row(double,bigint)")
.build());
}
//
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should move or remove this comment block

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Yes, so as apache/gluten#11892

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Do you have any further comments? @devavret

@devavret
Copy link
Copy Markdown
Collaborator

It seems presto CI will only work after this PR gets merged and presto pin updated. I also think currently presto won't break because the only thing registered specially for presto is aggregation signatures which differ in return type and we don't enforce that

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Another PR also meets the hang, https://github.com/facebookincubator/velox/actions/runs/24341343799/job/71074538459?pr=17113, do you know the reason? @devavret

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Looks like the enforce_single_row_test is broken, could you help take a look? #16920 @perlitz

@claude
Copy link
Copy Markdown

claude Bot commented Apr 14, 2026

CI Failure Analysis

🟡 Expression Fuzzer with Presto SOT — FUZZER Failure View logs

Fuzzer crash: Instance 4 (seed=177491923) aborted.

The Presto reference query runner threw an exception that Velox did not throw, causing the ExpressionVerifier::verify assertion "Reference path throws" to fire.

ExpressionVerifier.cpp:445] Only reference path threw exception

Reference path throws for query:
  SELECT all_keys_match(c0, (__a0) -> (__a0 is null)) as p0, row_number as p1
  FROM (SELECT transform_keys(c0, (k, v) -> transform_keys(k, (k, v) ->
    coalesce(try(parse_datetime(k, 'yyyy-MM-dd HH:mm:ss.SSS ZZZ')),
             parse_datetime(k, 'yyyy-MM-dd HH:mm:ss.SSS ZZ')))) as c0,
    row_number as row_number FROM (...))

Presto error: Invalid format: "22300-03-25 02:20:26.849 Europe/Chisinau"
  is malformed at "Europe/Chisinau"

File: velox/expression/tests/ExpressionVerifier.cpp:453

The root cause is a parse_datetime format mismatch between Velox and Presto for out-of-range dates with timezone patterns (ZZZ vs ZZ). Presto rejects the input while Velox succeeds, triggering the "only reference path threw" assertion.


🟡 Window Fuzzer with Presto as source of truth — FUZZER Failure View logs

Fuzzer crash: Instance 1 (seed=9992519) aborted.

The window fuzzer's verification rate dropped below the 50% threshold due to too many Presto reference query failures (negative/null frame offsets, hugeint type not supported, etc.).

WindowFuzzer.cpp:593] VELOX_CHECK failed:
  (stats_.numVerified + stats_.numVerificationSkipped) / (double)iteration >= 0.5
  (0.4788732394366197 vs. 0.5)

File: velox/exec/fuzzer/WindowFuzzer.cpp:593

Key Presto errors during run:
  - "Window frame offset value must not be negative or null" (14 occurrences)
  - "Unknown type: hugeint" (6 occurrences)
  - "Unsupported type parameters for make_set_digest" (1 occurrence)

The verification rate was 47.89% (just below the 50% threshold), meaning too many iterations were skipped due to Presto reference failures.


Correlation with PR changes:
The PR (#16960) refactors cuDF aggregate function registration by splitting engine-specific (Spark/Presto) functions into separate files and removing the functionEngine config field. The changes are entirely within velox/experimental/cudf/ — they do not touch any fuzzer code, expression evaluation, window operators, or Presto query runner logic. These failures are unrelated to the PR changes.

Known issues:

  • The Window Fuzzer verification rate failure is tracked in #16917 — "Flaky Window Fuzzer: verification rate drops below 50% due to Presto reference query failures"
  • The Expression Fuzzer "reference path throws" pattern is a recurring fuzzer flake (see #14422)
  • The "Expression Fuzzer with Presto SOT" job also failed on main (run 24380730587), confirming this is a pre-existing/flaky failure

Reproduce locally:

# Expression Fuzzer (seed=177491923)
./_build/debug/velox/expression/fuzzer/velox_expression_fuzzer_test \
  --seed 177491923 \
  --enable_variadic_signatures \
  --velox_fuzzer_enable_complex_types \
  --lazy_vector_generation_ratio 0.2 \
  --duration_sec 300 \
  --logtostderr=1 --minloglevel=0

# Window Fuzzer (seed=9992519)
./_build/debug/velox/functions/prestosql/fuzzer/velox_window_fuzzer_test \
  --seed 9992519 \
  --duration_sec 300 \
  --logtostderr=1 --minloglevel=0

Note: Both fuzzers require a running Presto server for SOT comparison mode.

Recommended action: No action needed from this PR. Both failures are pre-existing flaky fuzzer issues unrelated to the cuDF refactoring changes.

@jinchengchenghh jinchengchenghh added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Apr 14, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 14, 2026

@peterenescu has imported this pull request. If you are a Meta employee, you can view this in D100832100.

@meta-codesync meta-codesync Bot closed this in 7c8d926 Apr 16, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 16, 2026

@peterenescu merged this pull request in 7c8d926.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. cudf cudf related - GPU acceleration Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants