[WIP][DO-NOT-REVIEW][SPARK-55886][SQL] Add DataFrame.zip for merging column-projected DataFrames#54976
Draft
zhengruifeng wants to merge 6 commits intoapache:masterfrom
Draft
[WIP][DO-NOT-REVIEW][SPARK-55886][SQL] Add DataFrame.zip for merging column-projected DataFrames#54976zhengruifeng wants to merge 6 commits intoapache:masterfrom
DataFrame.zip for merging column-projected DataFrames#54976zhengruifeng wants to merge 6 commits intoapache:masterfrom
Conversation
Add a new DataFrame.zip(other) API that combines columns from two DataFrames that derive from the same base plan through Project chains. The optimizer rewrites the Zip node into a single Project over the shared base plan, and analysis rejects plans that cannot be merged. Co-authored-by: Isaac
DataFrame.zip for merging column-projected DataFramesDataFrame.zip for merging column-projected DataFrames
Zip is now always unresolved (resolved=false). A new ResolveZip analyzer rule rewrites it into a Project when both children share the same base plan. Removes the CollapseZip optimizer rule. Co-authored-by: Isaac
Zip is always unresolved, so deduplication does not help it resolve. ResolveZip already handles attribute remapping from right base to left base via sameResult() and AttributeMap. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
No longer referenced after removing Zip from DeduplicateRelations and changing resolved to always false. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a new DataFrame.zip(other) API that combines columns from two DataFrames that derive from the same base plan through Project chains. The optimizer rewrites the Zip node into a single Project over the shared base plan, and analysis rejects plans that cannot be merged.
Co-authored-by: Isaac
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?