[MINOR][VL] Drop modify_arrow_dataset_scan_option.patch#12148
Open
yaooqinn wants to merge 1 commit into
Open
Conversation
This Arrow patch added a native CSV/dataset scan-option API (CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral, etc.) that was originally consumed by Gluten's native Arrow CSV reader path. That path is gone: - apache#11190 fell CSV back to vanilla Spark - 0658e90 / 97f4638 / 9ea8290 / d206c5e removed the JVM callers, the ArrowUtil reader path, and the spark.gluten.sql.native.arrow.reader.enabled config Nothing inside Gluten or Velox now references the symbols the patch introduces, so it is dead code in the build. This change drops: - ep/build-velox/src/modify_arrow_dataset_scan_option.patch - the `patch -p1 <...>` line in dev/build-arrow.sh - the `cp` / `git add` lines in ep/build-velox/src/get-velox.sh that staged the patch into the Velox source tree Verification (Ubuntu 24.04, x86_64): - grep across the repo: 0 callers of CsvFragmentScanOptions, CsvFragmentScanOptions::from, DeserializeMap, mapToExpressionLiteral, createNative(... FragmentScanOptions), CsvConvertOptions, testCsvConvertOptions - `nm -C` on the freshly built libarrow_dataset.a (java-dist and cpp-jni) shows none of the patch's symbols are present - arrow_ep/cpp/src/arrow/dataset/file_csv.cc on disk is the unmodified upstream source -- patch was never applied - dev/buildbundle-veloxbe.sh --enable_vcpkg=ON completes BUILD SUCCESS for all 5 Spark profiles (3.3 / 3.4 / 3.5 / 4.0 / 4.1) - spark-shell on the resulting bundle, reading a CSV file, prints "GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format TextReadFormat" and produces a vanilla `FileScan csv` physical plan -- confirming CSV is fallback-by-design and never enters the native path the dropped patch fed Generated-by: Claude claude-opus-4.7
Member
|
Thanks for the PR. Just to confirm – will we permanently remove the arrow based CSV reader support? cc @jinchengchenghh, @FelixYBW |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Drop
ep/build-velox/src/modify_arrow_dataset_scan_option.patchand the twoshell call sites that apply / stage it:
dev/build-arrow.sh—patch -p1 < .../modify_arrow_dataset_scan_option.patchep/build-velox/src/get-velox.sh—cp+git addthat copy the patchinto the Velox source tree
Why are the changes needed?
The patch added a native CSV / dataset scan-option API
(
CsvFragmentScanOptions::from,DeserializeMap,mapToExpressionLiteral, …)that was originally consumed by Gluten's native Arrow CSV reader path.
That path is gone:
0658e906f/97f463813/9ea8290a/d206c5e20removed the JVM-sidecallers, the
ArrowUtilreader path, and thespark.gluten.sql.native.arrow.reader.enabledconfigNothing inside Gluten or Velox now references the symbols the patch
introduces, so it's dead code in the build. This PR is the final cleanup in
that chain.
How was this patch tested?
Local verification on Ubuntu 24.04 / x86_64:
grepacross the repo — 0 callers ofCsvFragmentScanOptions,CsvFragmentScanOptions::from,DeserializeMap,mapToExpressionLiteral,createNative(... FragmentScanOptions),CsvConvertOptions,testCsvConvertOptionsnm -Con freshly builtlibarrow_dataset.a(bothjava-distandcpp-jnioutputs) — none of the patch's symbols are presentarrow_ep/cpp/src/arrow/dataset/file_csv.ccon disk is the unmodifiedupstream source — the patch was never applied with this change in place
dev/buildbundle-veloxbe.sh --enable_vcpkg=ON→ BUILD SUCCESS for all 5Spark profiles (3.3 / 3.4 / 3.5 / 4.0 / 4.1)
spark-shellon the resulting bundle, reading a CSV file, printsGlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format TextReadFormatand produces a vanillaFileScan csvphysical plan — confirming CSV is fallback-by-design andnever enters the native path the dropped patch fed.
Generated-by: Claude claude-opus-4.7