Support UPDATE ... FROM and preserve source-qualified assignments#20745
Draft
kosiew wants to merge 17 commits intoapache:mainfrom
Draft
Support UPDATE ... FROM and preserve source-qualified assignments#20745kosiew wants to merge 17 commits intoapache:mainfrom
UPDATE ... FROM and preserve source-qualified assignments#20745kosiew wants to merge 17 commits intoapache:mainfrom
Conversation
Enhance testing for UPDATE ... FROM alias and shape variants in update.slt. Introduce targeted planner/unit tests for qualifier and joined-assignment patterns in dml_planning.rs, currently asserting for the existing guard error.
Remove hard guard for UPDATE ... FROM in SQL planner. Enhance assignment extraction to preserve qualifiers for multi-table updates and filter identity assignments with target-table awareness. Update API docs to clarify multi-table update behavior and add integration and unit tests to validate new functionality and expectations.
… related components
…ing modes are resolved
…ty and maintainability
Revise notes on supported scenarios for single-table UPDATE ... FROM syntax. Clarify intent with accurate comments regarding FROM placement, alias support in assignments and predicates, and row-count mismatch error scenarios.
Update the logic in UPDATE ... FROM to retain the original target-row image during planning. This change allows MemTable::update_from to correctly match replacements to stored rows based on that image, addressing the join-only/source-only predicate case that was previously failing.
Narrow the update target alias discovery to the actual target branch of the logical plan. This change addresses the self-join case where the source alias was incorrectly treated as a target alias, which could lead to unintended identity assignments and leakage of source-side predicates into provider filters. Added regressions to verify that self-join assignments and filter extractions are handled correctly.
Delegate projection processing to a shared helper for both top-level and nested projection traversal. Introduce projection_alias_assignments to map aliases to (column_name, assignment_expr) pairs. Implement append_update_assignments_from_projection to streamline identity-assignment filtering and single-table qualifier stripping.
Add a new integration test to verify that the UPDATE statement executes correctly when only a join predicate is used. The test checks that the final table contents of t1 are as expected after executing the UPDATE from t2, ensuring that no target-only or source-only filters are applied.
Add shared UPDATE_FROM_OLD_COLUMN_PREFIX and function update_from_old_column_name in datafusion_expr. Update TableProvider::update_from documentation to clarify the input schema shape for providers, specifying the order of new and original target columns. Remove duplicated local prefix and helpers in the SQL planner and MemTable, utilizing the shared helper instead.
Updated update.slt to include a new case that forces one target row to match multiple source rows in an UPDATE ... FROM statement. This addition introduces a duplicate source row in t2 and asserts that the operation results in an error as expected.
Refactor the helper function from `find_update_target_branch` to `find_dml_target_branch`. Update traversal to support Limit wrappers and preserve Join behavior for UPDATE ... FROM statements. Adjust call sites to the new helper and modify error messages to reflect DML changes. Add regression tests for delete-plan and shared-path scenarios to ensure functionality.
Clarify allowed DML wrapper subset for find_dml_target_branch and reinforce fast failure on unexpected nodes with inline guard comments to maintain strict semantics in the function.
Enhance the `doc` function to outline the broader contract and flow, including target-branch discovery, the collection of target aliases, and the filtering processes. Update inline comments to reflect the new contract, detailing the alias collection, whole-input filter scan, and target-branch scan pushdown passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
UPDATE ...FROMbug #19950.Rationale for this change
UPDATE ... FROMwas previously rejected or behaved incorrectly because joined assignment expressions were not preserved end-to-end. In particular, source-table references such ast2.bcould be lost during planning, which meant updates either failed outright or used the wrong values when applying joined assignments.This change fixes that gap so
UPDATE ... FROMworks for the supported single-source-table form and produces the expected target-row updates. It also aligns filter extraction and assignment handling with the semantics users expect from joined updates, including aliased target/source tables and self-joins.What changes are included in this PR?
This PR adds end-to-end support for
UPDATE ... FROMacross SQL planning, physical planning, andMemTableexecution.At a high level, the changes include:
removing the SQL planner restriction that rejected
UPDATE ... FROMextending DML planning to project both:
__df_update_old_*columnsadding
TableProvider::update_from(...)as a dedicated API for providers that need precomputed joined update rowsimplementing
update_from(...)forMemTablematching replacement rows back to original target rows using the projected original-row image
preserving source qualifiers for multi-table update assignments while keeping existing single-table
UPDATEbehaviorimproving DML filter extraction so only target-table predicates are forwarded to providers in
UPDATE ... FROMand self-join casesresolving join partitioning before handing joined update plans to table providers
adding helper utilities and documentation for hidden original-row column naming
Are these changes tested?
Yes.
This PR adds and updates tests across SQL planning, physical planning, provider behavior, and sqllogictest coverage. The new tests cover:
UPDATE ... FROM, including projection of hidden original-row columnsUPDATEqualifier stripping behavior remaining unchangedUPDATE ... FROMMemTableAre there any user-facing changes?
Yes.
Users can now run supported
UPDATE ... FROMstatements successfully in DataFusion, including cases with aliases and joined assignment expressions such as:This is a functional improvement to SQL behavior. It also introduces a new
TableProvider::update_from(...)hook for provider implementations that want to support joined updates.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.