[VL] Add ANSI mode support for Spark CAST(NumericType as integral)#11854
[VL] Add ANSI mode support for Spark CAST(NumericType as integral)#11854minni31 wants to merge 2 commits intoapache:mainfrom
Conversation
Enables ANSI mode handling for cast operations from numeric types (int, long, float, double) to integral types (tinyint, smallint, integer, bigint) in the Velox backend. Changes: - VeloxConfig.h: Add spark.sql.ansi.enabled config key - WholeStageResultIterator.cc: Propagate ANSI config to Velox QueryConfig - CastNode.java: Use int failureBehavior (0=UNSPECIFIED, 1=RETURN_NULL, 2=THROW_EXCEPTION) instead of boolean isTryCast - ExpressionBuilder.java: Add makeCast overload with isAnsiCast param - UnaryExpressionTransformer.scala: Pass withAnsiEvalMode to CastTransformer - Spark34/35/40Shims.scala: Add Cast to withAnsiEvalMode pattern matching - VeloxTestSettings.scala: Remove exclusions for ANSI cast overflow tests - GlutenTryCastSuite.scala: Enable ANSI overflow tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
weiting-chen
left a comment
There was a problem hiding this comment.
Review findings from code analysis. Two blocking issues identified - see inline comments for details.
| const std::string kOpTraceDirectoryCreateConfig = | ||
| "spark.gluten.sql.columnar.backend.velox.opTraceDirectoryCreateConfig"; | ||
|
|
||
| const std::string kVeloxSparkAnsiModeEnabled = "spark.sql.ansi.enabled"; |
There was a problem hiding this comment.
Redundant config constant - ANSI mode is already propagated
Problem: This new constant kVeloxSparkAnsiModeEnabled duplicates the existing kAnsiEnabled = "spark.sql.ansi.enabled" already defined in cpp/core/config/GlutenConfig.h. Furthermore, the corresponding config propagation added in WholeStageResultIterator.cc references velox::core::QueryConfig::kSparkAnsiModeEnabled, which does not exist in the currently pinned Velox version (the existing constant is kSparkAnsiEnabled). This is the root cause of the build-native-lib-centos-7 CI failure.
Evidence:
`cpp
// Already exists on main (GlutenConfig.h):
const std::string kAnsiEnabled = "spark.sql.ansi.enabled";
// Already exists on main (WholeStageResultIterator.cc line 576):
configs[velox::core::QueryConfig::kSparkAnsiEnabled] =
veloxCfg_->getstd::string(kAnsiEnabled, "false");
// This PR adds (duplicate + wrong constant name):
const std::string kVeloxSparkAnsiModeEnabled = "spark.sql.ansi.enabled";
configs[velox::core::QueryConfig::kSparkAnsiModeEnabled] = ... // does not exist!
`
Suggested Fix: Remove both C++ changes entirely (this file and the WholeStageResultIterator.cc change). The existing infrastructure in GlutenConfig.h already handles ANSI mode propagation to Velox.
| @@ -239,7 +239,21 @@ public static AggregateFunctionNode makeAggregateFunction( | |||
|
|
|||
| public static CastNode makeCast( | |||
| TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast) { | |||
There was a problem hiding this comment.
Inconsistency between 3-arg and 4-arg makeCast overloads
Problem: The backward-compatible 3-arg overload maps isTryCast=false to THROW_EXCEPTION(2), while the new 4-arg overload (used by CastTransformer) maps the same semantic intent (non-TRY, non-ANSI legacy cast) to UNSPECIFIED(0). Both produce identical Velox runtime behavior (verified in SubstraitToVeloxExpr.cc - Velox treats UNSPECIFIED and THROW_EXCEPTION the same for regular casts), but the inconsistency is confusing for future maintainers.
Evidence:
`java
// 3-arg: non-TRY -> THROW_EXCEPTION (2)
return new CastNode(typeNode, expressionNode, isTryCast ? 1 : 2);
// 4-arg: non-TRY, non-ANSI -> UNSPECIFIED (0)
failureBehavior = 0; // UNSPECIFIED (legacy)
`
Suggested Fix: Align the 3-arg overload to also use UNSPECIFIED(0) for non-TRY casts:
java public static CastNode makeCast( TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast) { return new CastNode(typeNode, expressionNode, isTryCast ? 1 : 0); }
Note: verify ClickHouse backend doesn't depend on receiving THROW_EXCEPTION from this overload before making this change.
| @@ -104,11 +104,6 @@ class VeloxTestSettings extends BackendTestSettings { | |||
| .exclude( | |||
| "Process Infinity, -Infinity, NaN in case insensitive manner" // +inf not supported in folly. | |||
| ) | |||
There was a problem hiding this comment.
Unrelated exclusion removal - "cast from timestamp II" removed only in spark35
Problem: This spark35 VeloxTestSettings change removes the "cast from timestamp II" exclusion in addition to the 4 ANSI overflow tests. The spark34 and spark40 versions of this file keep this exclusion. This appears unrelated to the ANSI numeric cast feature.
Investigation Needed: Is this removal intentional? If so, please add a note in the PR description explaining why it's included. If accidental, please revert this line and handle it in a separate PR.
|
Blocking dependency: Velox PR #16962 not yet merged This PR depends on facebookincubator/velox#16962 which is still open. Even after fixing the C++ compilation issue (see inline comment on VeloxConfig.h), the 4 un-excluded ANSI cast overflow tests will fail because the current Velox Please coordinate with Velox reviewers to get #16962 merged, then update Gluten's Velox pin before this PR can pass CI. Also: PR title is missing the required |
Summary
Enables ANSI mode handling for cast operations from numeric types (int, long, float, double) to integral types (tinyint, smallint, integer, bigint) in the Velox backend.
When
spark.sql.ansi.enabled=true, CAST from numeric to integral types now throws on overflow instead of silently wrapping/truncating, matching Spark's ANSI compliance.Depends on Velox PR: facebookincubator/velox#16962
Changes
C++ (Velox backend config)
spark.sql.ansi.enabledconfig keyScala/Java (Substrait conversion)
failureBehavior(0=UNSPECIFIED, 1=RETURN_NULL, 2=THROW_EXCEPTION) instead of booleanisTryCastmakeCastoverload withisAnsiCastparamwithAnsiEvalModeto CastTransformerShims
CasttowithAnsiEvalModepattern matchingTests
Testing