Experimental: support parquet partition write by kazantsev-maksim · Pull Request #4670 · apache/datafusion-comet

kazantsev-maksim · 2026-06-17T17:32:09Z

Which issue does this PR close?

Rationale for this change

The native Parquet writer previously only supported local filesystem and HDFS output, and explicitly bailed out (Unsupported) on any query with partition columns. This meant that the very common INSERT ... PARTITIONED BY (...) pattern and any S3-backed table fell back to Spark's row-based writer.

What changes are included in this PR?

Native (Rust): New partition_writer.rs with PartitionedWriter, which:
splits each RecordBatch by partition key and routes rows to a per-partition writer keyed by sub-directory (e.g. a=1/b=2); Escapes partition values to exactly mirror Spark/Hive ExternalCatalogUtils.escapePathName (control chars + a fixed special-char set percent-encoded as upper-case %XX);
maps null/empty partition values to HIVE_DEFAULT_PARTITION (Spark's DEFAULT_PARTITION_NAME);
lazily creates writers on first use and closes all open writers concurrently, returning the real part-file paths.
parquet_writer.rs: refactored into a ParquetWriter trait with a StorageWriterFactory that selects the backend by URL scheme — local FS, HDFS (OpendalWriter), and now S3A (OpendalWriter behind s3-opendal). S3 credentials/endpoint/region are read from fs.s3a.* object-store options.

Proto: new repeated string partition_columns = 9 on the ParquetWriter message; planner wires partition_columns through to the native writer.

Spark (Scala): CometDataWritingCommand: supported output filesystems extended to file, hdfs, s3a; partitioned writes are no longer rejected (only static partitions remain unsupported); partition column names are serialized into the proto.

How are these changes tested?

Testing in progress

This reverts commit 768b3e9.

# Conflicts: # native/Cargo.lock # native/core/Cargo.toml

# Conflicts: # native/core/Cargo.toml

Kazantsev Maksim and others added 30 commits December 14, 2025 16:24

impl map_from_entries

768b3e9

Revert "impl map_from_entries"

c68c342

This reverts commit 768b3e9.

Merge branch 'apache:main' into main

d887555

Merge branch 'apache:main' into main

231aa90

Merge branch 'apache:main' into main

9500bbb

Merge branch 'apache:main' into main

9577481

Merge branch 'apache:main' into main

3791557

Merge branch 'apache:main' into main

7c2f082

Merge branch 'apache:main' into main

609a605

Merge branch 'apache:main' into main

a151b2c

Merge branch 'apache:main' into main

ad3e7f5

Merge branch 'apache:main' into main

ea92e4b

Merge branch 'apache:main' into main

8dfeca3

Merge branch 'apache:main' into main

559741e

Merge branch 'apache:main' into main

ebda14e

Merge branch 'apache:main' into main

408152e

Merge branch 'apache:main' into main

d7857b2

Merge branch 'apache:main' into main

aef41be

Merge branch 'apache:main' into main

5ac1c58

Merge branch 'apache:main' into main

9ae8e23

Merge branch 'apache:main' into main

5ca3888

Merge branch 'apache:main' into main

160a817

Merge branch 'apache:main' into main

88fc313

Merge branch 'apache:main' into main

e14c180

Merge branch 'apache:main' into main

610a885

Merge branch 'apache:main' into main

f8acb2c

Merge branch 'apache:main' into main

ec94897

Merge branch 'apache:main' into main

43405e4

Merge branch 'apache:main' into main

47b4915

Merge branch 'apache:main' into main

26e2682

kazantsev-maksim and others added 27 commits April 24, 2026 21:57

Merge branch 'apache:main' into main

314e594

Merge branch 'apache:main' into main

ac8292f

WIP

9da0edb

work

f2bce23

Merge branch 'apache:main' into main

c9c140e

Merge branch 'apache:main' into main

decca58

Merge branch 'apache:main' into main

0919b33

Merge branch 'apache:main' into main

7495e21

Merge branch 'apache:main' into main

0a37a60

Merge branch 'apache:main' into main

abbba84

Merge branch 'apache:main' into main

6020560

Merge remote-tracking branch 'origin/main' into support_native_s3_write

838dcb9

# Conflicts: # native/Cargo.lock # native/core/Cargo.toml

Merge branch 'apache:main' into main

e2bdfb1

Complete native s3 write draft feature

2b2acb1

Merge remote-tracking branch 'origin/main' into support_native_s3_write

f47426b

Merge branch 'main' into support_native_s3_write

abe1a8e

Merge branch 'apache:main' into main

3edfc33

Merge branch 'apache:main' into main

a39e860

Merge branch 'apache:main' into main

e88dd7b

Merge branch 'apache:main' into main

3e29d37

refactoring

981a1f5

fmt

b487f91

refactoring

0d074e3

Merge branch 'apache:main' into main

4068359

Merge branch 'apache:main' into main

a3cb8de

Merge remote-tracking branch 'origin/main' into support_native_s3_write

7a67a1c

# Conflicts: # native/core/Cargo.toml

Add partition writing

f66cbd4

kazantsev-maksim marked this pull request as draft June 17, 2026 17:32

more tests

1d7d729

kazantsev-maksim mentioned this pull request Jun 18, 2026

feat: implement native empty2null spark inner function #4683

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: support parquet partition write#4670

Experimental: support parquet partition write#4670
kazantsev-maksim wants to merge 67 commits into
apache:mainfrom
kazantsev-maksim:support_parquet_partition_write

kazantsev-maksim commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kazantsev-maksim commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kazantsev-maksim commented Jun 17, 2026 •

edited

Loading