Skip to content

chore: Run Spark 4.0 SQL tests with native_datafusion scan#3728

Draft
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion-4.0
Draft

chore: Run Spark 4.0 SQL tests with native_datafusion scan#3728
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion-4.0

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Mar 18, 2026

Which issue does this PR close?

Follow-up to #3694 which enabled this for Spark 3.5, and the companion PR for Spark 3.4.

Rationale for this change

We already run Spark SQL tests with native_datafusion scan implementation for Spark 3.4 and 3.5 but not for Spark 4.0. This PR adds Spark 4.0 to the CI matrix for native_datafusion scan testing.

What changes are included in this PR?

  • Add native_datafusion scan-impl matrix entry for Spark 4.0 in spark_sql_test.yml
  • Add sql_hive-1 exclusion for the new native_datafusion Spark 4.0 config (same exclusion as the auto config)
  • Update 4.0.1.diff to add missing CometNativeScanExec pattern matches (ported from the 3.5.8 diff):
    • SchemaPruningSuite.checkScanSchemata: add CometNativeScanExec case to fix all 183 ParquetV1SchemaPruningSuite failures (the helper only matched FileSourceScanExec and CometScanExec, missing the native scan node)
    • FileBasedDataSourceSuite: add CometNativeScanExec to import and dataFilters pattern match
  • Update 4.0.1.diff to annotate tests with IgnoreCometNativeDataFusion that are known to fail with native_datafusion scan:
    • DynamicPartitionPruningSuite: "static scan metrics", "join key with multiple references on the filtering plan"
    • FileBasedDataSourceSuite: "Spark native readers should respect spark.sql.caseSensitive", "Enabling/disabling ignoreMissingFiles using parquet", "SPARK-41017: filter pushdown with nondeterministic predicates"
    • ParquetFilterSuite: "SPARK-25207: exception when duplicate fields in case-insensitive mode"
    • ParquetIOSuite: "SPARK-35640: read binary as timestamp should throw schema incompatible error"
    • ParquetQuerySuite: "SPARK-47447: read TimestampLTZ as TimestampNTZ", "SPARK-34212 Parquet should read decimals correctly", "row group skipping doesn't overflow when reading into larger type", "Enabling/disabling ignoreCorruptFiles"
    • ParquetSchemaSuite: "schema mismatch failure error message for parquet vectorized reader", "SPARK-45604: schema mismatch failure error on timestamp_ntz to array<timestamp_ntz>"
    • ParquetTypeWideningSuite: all "unsupported parquet conversion", "unsupported parquet timestamp conversion", "parquet decimal precision change", "parquet decimal precision and scale change", and "parquet widening conversion DateType -> TimestampNTZType" tests
    • SubquerySuite: "Subquery reuse across the whole plan", "SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery"
    • SQLViewSuite: "alter temporary view should follow current storeAnalyzedPlanForView config" (covers both SimpleSQLViewSuite and HiveSQLViewSuite)
    • ExtractPythonUDFsSuite: "Python UDF should not break column pruning/filter pushdown -- Parquet V1"

How are these changes tested?

By running the Spark SQL tests in CI with the new native_datafusion configuration for Spark 4.0.

Add native_datafusion scan-impl matrix entry for Spark 4.0 in
spark_sql_test.yml and update 4.0.1.diff to ignore tests that fail
with native_datafusion scan (same tests as Spark 3.4).
@andygrove andygrove marked this pull request as draft March 18, 2026 12:34
@andygrove andygrove changed the title Run Spark 4.0 SQL tests with native_datafusion scan chore: Run Spark 4.0 SQL tests with native_datafusion scan Mar 18, 2026
…1 diff

Add missing CometNativeScanExec pattern matches to SchemaPruningSuite and
FileBasedDataSourceSuite, fixing all 183 ParquetV1SchemaPruningSuite failures.
Tag remaining incompatible tests with IgnoreCometNativeDataFusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant