Skip to content

[VL] Add ANSI mode support for Spark CAST(NumericType as integral)#11854

Open
minni31 wants to merge 2 commits intoapache:mainfrom
minni31:oss/ansi-cast-numeric-to-integral
Open

[VL] Add ANSI mode support for Spark CAST(NumericType as integral)#11854
minni31 wants to merge 2 commits intoapache:mainfrom
minni31:oss/ansi-cast-numeric-to-integral

Conversation

@minni31
Copy link
Copy Markdown

@minni31 minni31 commented Mar 30, 2026

Summary

Enables ANSI mode handling for cast operations from numeric types (int, long, float, double) to integral types (tinyint, smallint, integer, bigint) in the Velox backend.

When spark.sql.ansi.enabled=true, CAST from numeric to integral types now throws on overflow instead of silently wrapping/truncating, matching Spark's ANSI compliance.

Depends on Velox PR: facebookincubator/velox#16962

Changes

C++ (Velox backend config)

  • VeloxConfig.h: Add spark.sql.ansi.enabled config key
  • WholeStageResultIterator.cc: Propagate ANSI config to Velox QueryConfig

Scala/Java (Substrait conversion)

  • CastNode.java: Use int failureBehavior (0=UNSPECIFIED, 1=RETURN_NULL, 2=THROW_EXCEPTION) instead of boolean isTryCast
  • ExpressionBuilder.java: Add makeCast overload with isAnsiCast param
  • UnaryExpressionTransformer.scala: Pass withAnsiEvalMode to CastTransformer

Shims

  • Spark34/35/40Shims.scala: Add Cast to withAnsiEvalMode pattern matching

Tests

  • VeloxTestSettings.scala (spark34/35/40): Remove exclusions for 4 ANSI cast overflow tests
  • GlutenTryCastSuite.scala: Enable ANSI overflow tests

Testing

  • GlutenTryCastSuite: 78/78 tests passed, 0 failures

Enables ANSI mode handling for cast operations from numeric types
(int, long, float, double) to integral types (tinyint, smallint,
integer, bigint) in the Velox backend.

Changes:
- VeloxConfig.h: Add spark.sql.ansi.enabled config key
- WholeStageResultIterator.cc: Propagate ANSI config to Velox QueryConfig
- CastNode.java: Use int failureBehavior (0=UNSPECIFIED, 1=RETURN_NULL,
  2=THROW_EXCEPTION) instead of boolean isTryCast
- ExpressionBuilder.java: Add makeCast overload with isAnsiCast param
- UnaryExpressionTransformer.scala: Pass withAnsiEvalMode to CastTransformer
- Spark34/35/40Shims.scala: Add Cast to withAnsiEvalMode pattern matching
- VeloxTestSettings.scala: Remove exclusions for ANSI cast overflow tests
- GlutenTryCastSuite.scala: Enable ANSI overflow tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Mar 30, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Contributor

@weiting-chen weiting-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review findings from code analysis. Two blocking issues identified - see inline comments for details.

const std::string kOpTraceDirectoryCreateConfig =
"spark.gluten.sql.columnar.backend.velox.opTraceDirectoryCreateConfig";

const std::string kVeloxSparkAnsiModeEnabled = "spark.sql.ansi.enabled";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant config constant - ANSI mode is already propagated

Problem: This new constant kVeloxSparkAnsiModeEnabled duplicates the existing kAnsiEnabled = "spark.sql.ansi.enabled" already defined in cpp/core/config/GlutenConfig.h. Furthermore, the corresponding config propagation added in WholeStageResultIterator.cc references velox::core::QueryConfig::kSparkAnsiModeEnabled, which does not exist in the currently pinned Velox version (the existing constant is kSparkAnsiEnabled). This is the root cause of the build-native-lib-centos-7 CI failure.

Evidence:
`cpp
// Already exists on main (GlutenConfig.h):
const std::string kAnsiEnabled = "spark.sql.ansi.enabled";

// Already exists on main (WholeStageResultIterator.cc line 576):
configs[velox::core::QueryConfig::kSparkAnsiEnabled] =
veloxCfg_->getstd::string(kAnsiEnabled, "false");

// This PR adds (duplicate + wrong constant name):
const std::string kVeloxSparkAnsiModeEnabled = "spark.sql.ansi.enabled";
configs[velox::core::QueryConfig::kSparkAnsiModeEnabled] = ... // does not exist!
`

Suggested Fix: Remove both C++ changes entirely (this file and the WholeStageResultIterator.cc change). The existing infrastructure in GlutenConfig.h already handles ANSI mode propagation to Velox.

@@ -239,7 +239,21 @@ public static AggregateFunctionNode makeAggregateFunction(

public static CastNode makeCast(
TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency between 3-arg and 4-arg makeCast overloads

Problem: The backward-compatible 3-arg overload maps isTryCast=false to THROW_EXCEPTION(2), while the new 4-arg overload (used by CastTransformer) maps the same semantic intent (non-TRY, non-ANSI legacy cast) to UNSPECIFIED(0). Both produce identical Velox runtime behavior (verified in SubstraitToVeloxExpr.cc - Velox treats UNSPECIFIED and THROW_EXCEPTION the same for regular casts), but the inconsistency is confusing for future maintainers.

Evidence:
`java
// 3-arg: non-TRY -> THROW_EXCEPTION (2)
return new CastNode(typeNode, expressionNode, isTryCast ? 1 : 2);

// 4-arg: non-TRY, non-ANSI -> UNSPECIFIED (0)
failureBehavior = 0; // UNSPECIFIED (legacy)
`

Suggested Fix: Align the 3-arg overload to also use UNSPECIFIED(0) for non-TRY casts:
java public static CastNode makeCast( TypeNode typeNode, ExpressionNode expressionNode, boolean isTryCast) { return new CastNode(typeNode, expressionNode, isTryCast ? 1 : 0); }
Note: verify ClickHouse backend doesn't depend on receiving THROW_EXCEPTION from this overload before making this change.

@@ -104,11 +104,6 @@ class VeloxTestSettings extends BackendTestSettings {
.exclude(
"Process Infinity, -Infinity, NaN in case insensitive manner" // +inf not supported in folly.
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated exclusion removal - "cast from timestamp II" removed only in spark35

Problem: This spark35 VeloxTestSettings change removes the "cast from timestamp II" exclusion in addition to the 4 ANSI overflow tests. The spark34 and spark40 versions of this file keep this exclusion. This appears unrelated to the ANSI numeric cast feature.

Investigation Needed: Is this removal intentional? If so, please add a note in the PR description explaining why it's included. If accidental, please revert this line and handle it in a separate PR.

@weiting-chen
Copy link
Copy Markdown
Contributor

Blocking dependency: Velox PR #16962 not yet merged

This PR depends on facebookincubator/velox#16962 which is still open. Even after fixing the C++ compilation issue (see inline comment on VeloxConfig.h), the 4 un-excluded ANSI cast overflow tests will fail because the current Velox isAnsiSupported() returns false for numeric-to-integral casts.

Please coordinate with Velox reviewers to get #16962 merged, then update Gluten's Velox pin before this PR can pass CI.

Also: PR title is missing the required [GLUTEN-XXXXX] prefix per contribution guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants