Skip to content

[WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support#54981

Closed
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-56171
Closed

[WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support#54981
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-56171

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This PR enables the file source V2 write path (FileWrite) to support partition columns, dynamic partition overwrite, and truncate (full overwrite), gated behind a new feature flag spark.sql.sources.v2.file.write.enabled (default false). It also removes the guard that blocked FileDataSourceV2 from working with catalog tables.

Previously, the V2 file write path was completely disabled — DataFrameWriter returned None for FileDataSourceV2, FallBackFileSourceV2 converted all InsertIntoStatement targeting FileTable back to the V1 path, and DataSourceV2Utils.getTableProvider unconditionally filtered out FileDataSourceV2 for catalog tables. The FileWrite.createWriteJobDescription hardcoded partitionColumns = Seq.empty and dataColumns = allColumns, meaning partitioned writes were impossible even if the V2 path was reached.

Why are the changes needed?

The file source V2 write path has been disabled since SPARK-28396 with a TODO comment. This blocks V2 from being the default for built-in file formats. The V2 API offers advantages over V1 (custom metrics, aggregate/limit/join pushdown, runtime filtering, row-level operations), but the write path must work before V2 can replace V1.

This is the first patch in a series to close the V1-V2 feature gap. It establishes the write infrastructure behind a feature flag so that subsequent patches can build on it incrementally: cache invalidation, flag flip, and delete FallBackFileSourceV2 (SPARK-56173), ErrorIfExists/Ignore modes (SPARK-56174), partition management (SPARK-56175), statistics (SPARK-56176), bucketing (SPARK-56177), and MSCK REPAIR TABLE (SPARK-56178).

Does this PR introduce any user-facing change?

No. The feature flag spark.sql.sources.v2.file.write.enabled defaults to false. All existing behavior is unchanged. When explicitly enabled, the V2 write path is used for Append and Overwrite modes via the DataFrame API, and catalog tables using file-based formats are loaded as V2 FileTable instances.

How was this patch tested?

Added 13 new tests in FileDataSourceV2FallBackSuite

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 4.6.

@LuciferYang LuciferYang marked this pull request as draft March 24, 2026 10:09
@LuciferYang LuciferYang changed the title [SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support [WIP][SPARK-56171][SQL] Enable file source V2 write path with partition, dynamic overwrite, and catalog table support Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant