Skip to content

fix: gate non-default collations for Spark 4 datetime expressions#4693

Open
0lai0 wants to merge 1 commit into
apache:mainfrom
0lai0:fix-4646-Gate-StringTypeWithCollation
Open

fix: gate non-default collations for Spark 4 datetime expressions#4693
0lai0 wants to merge 1 commit into
apache:mainfrom
0lai0:fix-4646-Gate-StringTypeWithCollation

Conversation

@0lai0

@0lai0 0lai0 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4646.

Rationale for this change

Spark 4.0 adds StringTypeWithCollation support to several datetime expression string inputs. Comet previously did not distinguish non-default collations for these inputs, so affected expressions could be planned as compatible even though native execution does not honor collation semantics.

What changes are included in this PR?

  • Adds a shared datetime collation guard for Spark 4 datetime expression serdes.
  • Marks non-default collated string inputs as Incompatible(Some(...)) for: convert_timezone, date_format, date_trunc, from_unixtime, make_timestamp, next_day, to_unix_timestamp, trunc, and unix_timestamp.
  • Updates Spark 4.0 and 4.1 collation tests to verify fallback reasons and codegen-dispatch behavior without LocalTableScan masking the expression fallback.

How are these changes tested?

  • ./mvnw compile test-compile -Pspark-4.0 -DskipTests
  • ./mvnw test -Pspark-4.0 -Dtest=none -Dsuites=org.apache.spark.sql.CometCollationSuite
  • ./mvnw compile test-compile -Pspark-4.1 -DskipTests
  • ./mvnw test -Pspark-4.1 -Dtest=none -Dsuites=org.apache.spark.sql.CometCollationSuite

" `TimestampNTZType` is not supported because Comet incorrectly applies timezone" +
" conversion to TimestampNTZ values.")

override def getIncompatibleReasons(): Seq[String] = Seq(collationReason)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: this incompatibility will get documented for all Spark versions even though it is specific to Spark 4.x

I wonder if we can shim getIncompatibleReasons so it only applies for 4.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gate non-default StringTypeWithCollation inputs on Spark 4.0 datetime expressions

2 participants