chore: Run Spark 4.0 SQL tests with native_datafusion scan by andygrove · Pull Request #3728 · apache/datafusion-comet

andygrove · 2026-03-18T12:34:05Z

Which issue does this PR close?

Follow-up to #3694 which enabled this for Spark 3.5, and the companion PR for Spark 3.4.

Rationale for this change

We already run Spark SQL tests with native_datafusion scan implementation for Spark 3.4 and 3.5 but not for Spark 4.0. This PR adds Spark 4.0 to the CI matrix for native_datafusion scan testing.

What changes are included in this PR?

Add native_datafusion scan-impl matrix entry for Spark 4.0 in spark_sql_test.yml
Add sql_hive-1 exclusion for the new native_datafusion Spark 4.0 config (same exclusion as the auto config)
Update 4.0.1.diff to add missing CometNativeScanExec pattern matches (ported from the 3.5.8 diff):
- SchemaPruningSuite.checkScanSchemata: add CometNativeScanExec case to fix all 183 ParquetV1SchemaPruningSuite failures (the helper only matched FileSourceScanExec and CometScanExec, missing the native scan node)
- FileBasedDataSourceSuite: add CometNativeScanExec to import and dataFilters pattern match
Update 4.0.1.diff to annotate tests with IgnoreCometNativeDataFusion that are known to fail with native_datafusion scan:
- DynamicPartitionPruningSuite: "static scan metrics", "join key with multiple references on the filtering plan"
- FileBasedDataSourceSuite: "Spark native readers should respect spark.sql.caseSensitive", "Enabling/disabling ignoreMissingFiles using parquet", "SPARK-41017: filter pushdown with nondeterministic predicates"
- ParquetFilterSuite: "SPARK-25207: exception when duplicate fields in case-insensitive mode"
- ParquetIOSuite: "SPARK-35640: read binary as timestamp should throw schema incompatible error"
- ParquetQuerySuite: "SPARK-47447: read TimestampLTZ as TimestampNTZ", "SPARK-34212 Parquet should read decimals correctly", "row group skipping doesn't overflow when reading into larger type", "Enabling/disabling ignoreCorruptFiles"
- ParquetSchemaSuite: "schema mismatch failure error message for parquet vectorized reader", "SPARK-45604: schema mismatch failure error on timestamp_ntz to array<timestamp_ntz>"
- ParquetTypeWideningSuite: all "unsupported parquet conversion", "unsupported parquet timestamp conversion", "parquet decimal precision change", "parquet decimal precision and scale change", and "parquet widening conversion DateType -> TimestampNTZType" tests
- SubquerySuite: "Subquery reuse across the whole plan", "SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery"
- SQLViewSuite: "alter temporary view should follow current storeAnalyzedPlanForView config" (covers both SimpleSQLViewSuite and HiveSQLViewSuite)
- ExtractPythonUDFsSuite: "Python UDF should not break column pruning/filter pushdown -- Parquet V1"

How are these changes tested?

By running the Spark SQL tests in CI with the new native_datafusion configuration for Spark 4.0.

Add native_datafusion scan-impl matrix entry for Spark 4.0 in spark_sql_test.yml and update 4.0.1.diff to ignore tests that fail with native_datafusion scan (same tests as Spark 3.4).

…fusion-4.0

…1 diff Add missing CometNativeScanExec pattern matches to SchemaPruningSuite and FileBasedDataSourceSuite, fixing all 183 ParquetV1SchemaPruningSuite failures. Tag remaining incompatible tests with IgnoreCometNativeDataFusion.

feat: run Spark 4.0 SQL tests with native_datafusion scan

ec7c384

Add native_datafusion scan-impl matrix entry for Spark 4.0 in spark_sql_test.yml and update 4.0.1.diff to ignore tests that fail with native_datafusion scan (same tests as Spark 3.4).

andygrove marked this pull request as draft March 18, 2026 12:34

andygrove changed the title ~~Run Spark 4.0 SQL tests with native_datafusion scan~~ chore: Run Spark 4.0 SQL tests with native_datafusion scan Mar 18, 2026

andygrove added 3 commits March 18, 2026 09:02

Merge remote-tracking branch 'apache/main' into spark-sql-native-data…

e65c15f

…fusion-4.0

fix: add missing quietly import in 4.0.1 diff FileBasedDataSourceSuite

6f0cd2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Run Spark 4.0 SQL tests with native_datafusion scan#3728

chore: Run Spark 4.0 SQL tests with native_datafusion scan#3728
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion-4.0

andygrove commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Mar 18, 2026 •

edited

Loading