-
Notifications
You must be signed in to change notification settings - Fork 335
fix: decline CreateArray with struct-nullability-divergent children #4533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
163aaa3
7110a0a
c874768
8a9d485
5f18d4a
f026b06
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1107,6 +1107,45 @@ class CometArrayExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelp | |
| } | ||
| } | ||
|
|
||
| test("array of structs with nullability-divergent children") { | ||
| // Spark's type coercion compares element types with `sameType`, which ignores nullability, | ||
| // so two struct children that differ ONLY in a nested field's nullability get no unifying | ||
| // cast -- CreateArray keeps children of different StructTypes. DataFusion's make_array asserts | ||
| // strict element-type equality (down to nested nullability) and panics on the mismatch. Comet | ||
| // must decline this CreateArray so Spark's evaluator handles it. | ||
| withParquetTable((0 until 5).map(i => (i, i.toLong)), "tbl") { | ||
| val df = spark | ||
| .table("tbl") | ||
| .select( | ||
| array( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test covers arrays and structs, but does not cover maps. Could you add a map test as well?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a map test, |
||
| // ct is NOT NULL (literal) | ||
| struct(col("_1").as("id"), lit("a").as("ct")), | ||
| // ct is NULLABLE (when without otherwise) -- same type, different nullability | ||
| struct(col("_1").as("id"), when(col("_1") === 0, lit("b")).as("ct"))).as("arr")) | ||
| checkSparkAnswerAndFallbackReason(df, "CreateArray children have mismatched data types") | ||
| } | ||
| } | ||
|
|
||
| test("array of maps with nullability-divergent struct values") { | ||
| // Same nested-nullability divergence as the struct case, but wrapped in a MapType value so we | ||
| // exercise normalizeContainerNullability's MapType branch: the two map children share a surface | ||
| // type and differ only in a nested struct field's nullability, so they survive container | ||
| // (`MapType.valueContainsNull`) normalization as distinct types and CreateArray must still | ||
| // decline -- DataFusion's make_array would otherwise panic on the struct-field mismatch. | ||
| withParquetTable((0 until 5).map(i => (i, i.toLong)), "tbl") { | ||
| val df = spark | ||
| .table("tbl") | ||
| .select( | ||
| array( | ||
| // map value struct has ct NOT NULL (literal) | ||
| map(lit("k"), struct(col("_1").as("id"), lit("a").as("ct"))), | ||
| // map value struct has ct NULLABLE -- same type, different nested nullability | ||
| map(lit("k"), struct(col("_1").as("id"), when(col("_1") === 0, lit("b")).as("ct")))) | ||
| .as("arr")) | ||
| checkSparkAnswerAndFallbackReason(df, "CreateArray children have mismatched data types") | ||
| } | ||
| } | ||
|
|
||
| // https://issues.apache.org/jira/browse/SPARK-55747 | ||
| test("(ansi) GetArrayItem on null array from split()") { | ||
| withSQLConf( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would be better to have this logic in
getSupportLeveland also implementgetIncompatibleReasonsso that this gets documented in the compatibility guide.Can be done later.