[MINOR][VL] Drop modify_arrow_dataset_scan_option.patch by yaooqinn · Pull Request #12148 · apache/gluten

yaooqinn · 2026-05-27T04:06:15Z

What changes were proposed in this pull request?

Drop ep/build-velox/src/modify_arrow_dataset_scan_option.patch and the two
shell call sites that apply / stage it:

dev/build-arrow.sh — patch -p1 < .../modify_arrow_dataset_scan_option.patch
ep/build-velox/src/get-velox.sh — cp + git add that copy the patch
into the Velox source tree

Why are the changes needed?

The patch added a native CSV / dataset scan-option API
(CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral, …)
that was originally consumed by Gluten's native Arrow CSV reader path.

That path is gone:

[GLUTEN-11088][VL] Fall back CSV reader #11190 fell CSV back to vanilla Spark
0658e906f / 97f463813 / 9ea8290a / d206c5e20 removed the JVM-side
callers, the ArrowUtil reader path, and the
spark.gluten.sql.native.arrow.reader.enabled config

Nothing inside Gluten or Velox now references the symbols the patch
introduces, so it's dead code in the build. This PR is the final cleanup in
that chain.

How was this patch tested?

Local verification on Ubuntu 24.04 / x86_64:

grep across the repo — 0 callers of CsvFragmentScanOptions,
CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral,
createNative(... FragmentScanOptions), CsvConvertOptions,
testCsvConvertOptions
nm -C on freshly built libarrow_dataset.a (both java-dist and
cpp-jni outputs) — none of the patch's symbols are present
arrow_ep/cpp/src/arrow/dataset/file_csv.cc on disk is the unmodified
upstream source — the patch was never applied with this change in place
dev/buildbundle-veloxbe.sh --enable_vcpkg=ON → BUILD SUCCESS for all 5
Spark profiles (3.3 / 3.4 / 3.5 / 4.0 / 4.1)
spark-shell on the resulting bundle, reading a CSV file, prints
GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format TextReadFormat and produces a vanilla
FileScan csv physical plan — confirming CSV is fallback-by-design and
never enters the native path the dropped patch fed.

Generated-by: Claude claude-opus-4.7

This Arrow patch added a native CSV/dataset scan-option API (CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral, etc.) that was originally consumed by Gluten's native Arrow CSV reader path. That path is gone: - apache#11190 fell CSV back to vanilla Spark - 0658e90 / 97f4638 / 9ea8290 / d206c5e removed the JVM callers, the ArrowUtil reader path, and the spark.gluten.sql.native.arrow.reader.enabled config Nothing inside Gluten or Velox now references the symbols the patch introduces, so it is dead code in the build. This change drops: - ep/build-velox/src/modify_arrow_dataset_scan_option.patch - the `patch -p1 <...>` line in dev/build-arrow.sh - the `cp` / `git add` lines in ep/build-velox/src/get-velox.sh that staged the patch into the Velox source tree Verification (Ubuntu 24.04, x86_64): - grep across the repo: 0 callers of CsvFragmentScanOptions, CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral, createNative(... FragmentScanOptions), CsvConvertOptions, testCsvConvertOptions - `nm -C` on the freshly built libarrow_dataset.a (java-dist and cpp-jni) shows none of the patch's symbols are present - arrow_ep/cpp/src/arrow/dataset/file_csv.cc on disk is the unmodified upstream source -- patch was never applied - dev/buildbundle-veloxbe.sh --enable_vcpkg=ON completes BUILD SUCCESS for all 5 Spark profiles (3.3 / 3.4 / 3.5 / 4.0 / 4.1) - spark-shell on the resulting bundle, reading a CSV file, prints "GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format TextReadFormat" and produces a vanilla `FileScan csv` physical plan -- confirming CSV is fallback-by-design and never enters the native path the dropped patch fed Generated-by: Claude claude-opus-4.7

philo-he · 2026-05-27T04:48:42Z

Thanks for the PR. Just to confirm – will we permanently remove the arrow based CSV reader support? cc @jinchengchenghh, @FelixYBW

github-actions Bot added BUILD VELOX labels May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MINOR][VL] Drop modify_arrow_dataset_scan_option.patch#12148

[MINOR][VL] Drop modify_arrow_dataset_scan_option.patch#12148
yaooqinn wants to merge 1 commit into
apache:mainfrom
yaooqinn:users/kentyao/drop-arrow-dataset-scan-option-patch

yaooqinn commented May 27, 2026

Uh oh!

philo-he commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaooqinn commented May 27, 2026

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Uh oh!

philo-he commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants