[WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support by LuciferYang · Pull Request #54981 · apache/spark

LuciferYang · 2026-03-24T09:35:32Z

What changes were proposed in this pull request?

This PR enables the file source V2 write path (FileWrite) to support partition columns, dynamic partition overwrite, and truncate (full overwrite), gated behind a new feature flag spark.sql.sources.v2.file.write.enabled (default false). It also removes the guard that blocked FileDataSourceV2 from working with catalog tables.

Previously, the V2 file write path was completely disabled — DataFrameWriter returned None for FileDataSourceV2, FallBackFileSourceV2 converted all InsertIntoStatement targeting FileTable back to the V1 path, and DataSourceV2Utils.getTableProvider unconditionally filtered out FileDataSourceV2 for catalog tables. The FileWrite.createWriteJobDescription hardcoded partitionColumns = Seq.empty and dataColumns = allColumns, meaning partitioned writes were impossible even if the V2 path was reached.

Why are the changes needed?

The file source V2 write path has been disabled since SPARK-28396 with a TODO comment. This blocks V2 from being the default for built-in file formats. The V2 API offers advantages over V1 (custom metrics, aggregate/limit/join pushdown, runtime filtering, row-level operations), but the write path must work before V2 can replace V1.

This is the first patch in a series to close the V1-V2 feature gap. It establishes the write infrastructure behind a feature flag so that subsequent patches can build on it incrementally: cache invalidation, flag flip, and delete FallBackFileSourceV2 (SPARK-56173), ErrorIfExists/Ignore modes (SPARK-56174), partition management (SPARK-56175), statistics (SPARK-56176), bucketing (SPARK-56177), and MSCK REPAIR TABLE (SPARK-56178).

Does this PR introduce any user-facing change?

No. The feature flag spark.sql.sources.v2.file.write.enabled defaults to false. All existing behavior is unchanged. When explicitly enabled, the V2 write path is used for Append and Overwrite modes via the DataFrame API, and catalog tables using file-based formats are loaded as V2 FileTable instances.

How was this patch tested?

Added 13 new tests in FileDataSourceV2FallBackSuite

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 4.6.

init

56bc368

LuciferYang marked this pull request as draft March 24, 2026 10:09

LuciferYang changed the title ~~[SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support~~ [WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support Mar 24, 2026

LuciferYang closed this Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support#54981

[WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support#54981
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-56171

LuciferYang commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Mar 24, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant