Skip to content

chore: run Spark 3.4 tests with native_datafusion scan#3722

Open
andygrove wants to merge 5 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion-3.4
Open

chore: run Spark 3.4 tests with native_datafusion scan#3722
andygrove wants to merge 5 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion-3.4

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Mar 17, 2026

Which issue does this PR close?

N/A - CI enablement

Rationale for this change

The native_datafusion Spark SQL tests were already running for Spark 3.5 but not for Spark 3.4. Adding 3.4 coverage revealed test failures that need to be skipped with the appropriate ignore tags.

What changes are included in this PR?

  • Add Spark 3.4.3 to the native_datafusion CI workflow
  • Update dev/diffs/3.4.3.diff to tag failing tests with IgnoreCometNativeDataFusion and IgnoreCometNativeScan, matching tags already applied in the 3.5.8 diff plus 3 tests specific to Spark 3.4:
    • FileBasedDataSourceSuite - "Spark native readers should respect spark.sql.caseSensitive"
    • ParquetIOSuite - "SPARK-35640: read binary as timestamp should throw schema incompatible error"
    • ParquetIOSuite - "SPARK-35640: int as long should throw schema incompatible error"
    • ParquetQuerySuite - "SPARK-36182: can't read TimestampLTZ as TimestampNTZ"
    • ParquetQuerySuite - "SPARK-34212 Parquet should read decimals correctly"
    • ParquetQuerySuite - "row group skipping doesn't overflow when reading into larger type"
    • ParquetSchemaSuite - "schema mismatch failure error message for parquet vectorized reader"
    • ParquetSchemaSuite - "SPARK-45604: schema mismatch failure error on timestamp_ntz to array<timestamp_ntz>"
    • ParquetFilterSuite - "filter pushdown - StringPredicate" (IgnoreCometNativeScan)
    • ParquetFilterSuite - "SPARK-25207: exception when duplicate fields in case-insensitive mode"
    • DynamicPartitionPruningSuite - "static scan metrics"
    • ExtractPythonUDFsSuite - "Python UDF should not break column pruning/filter pushdown -- Parquet V1"

How are these changes tested?

By running the Spark SQL native_datafusion tests in CI for Spark 3.4.3.

Tag tests with IgnoreCometNativeDataFusion and IgnoreCometNativeScan
to match tags already applied in the 3.5.8 diff, plus 3 tests that
are specific to Spark 3.4.
…diff

Remove imports of IgnoreCometNativeDataFusion in DynamicPartitionPruningSuite
and FileBasedDataSourceSuite since both files are in the org.apache.spark.sql
package where IgnoreCometNativeDataFusion is defined, making the import
redundant and causing "permanently hidden" compilation errors.
… diff

The test override in SQLTestUtils checked DisableAdaptiveExecution first,
so tests with both DisableAdaptiveExecution and IgnoreCometNativeDataFusion
tags (like "static scan metrics") would enter the DAE branch and never
check the ignore tag. Reorder to match the 3.5.8 diff: check Comet skip
tags first with early returns, then handle DisableAdaptiveExecution.
@andygrove andygrove changed the title chore: run Spark 3.4 tests with native_datafusion scan [WIP] chore: run Spark 3.4 tests with native_datafusion scan Mar 18, 2026
@andygrove andygrove marked this pull request as ready for review March 18, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant