Skip to content

Add SQL and physical planner support for MERGE INTO#2

Draft
wirybeaver wants to merge 2 commits into
feature/mergeinto-typefrom
feature/mergeinto
Draft

Add SQL and physical planner support for MERGE INTO#2
wirybeaver wants to merge 2 commits into
feature/mergeinto-typefrom
feature/mergeinto

Conversation

@wirybeaver

Copy link
Copy Markdown
Owner

Which issue does this PR close?

Stacked on top of apache#20763 (Add MERGE INTO types to datafusion-expr).

Rationale for this change

Complete the MERGE INTO execution path so that the type definitions added in the parent PR can be planned and dispatched.

What changes are included in this PR?

datafusion/catalog/src/table.rs — extend TableProvider with a merge_into async hook (default returns not_impl_err).

datafusion/sql/src/statement.rs — SQL planner:

  • merge_to_plan: parse Statement::Merge into LogicalPlan::Dml(WriteOp::MergeInto(...)). Resolves the target table, plans the USING source, builds a combined schema for resolving ON and WHEN expressions.
  • merge_clause_to_plan: converts each WHEN MATCHED / NOT MATCHED clause into a MergeIntoClause with typed MergeIntoAction.

datafusion/expr/src/logical_plan/dml.rs — expression traversal on MergeIntoOp:

  • exprs() — stable iteration (on expr → per-clause predicate → action value exprs).
  • with_new_exprs() — reconstruct op from a transformed expr slice (used by optimizer rewrites).

datafusion/expr/src/logical_plan/plan.rs / tree_node.rs — branch apply_expressions, map_expressions, with_new_exprs on WriteOp::MergeInto to delegate to the helpers above; other WriteOp variants are unchanged.

datafusion/core/src/physical_planner.rs — physical dispatch for WriteOp::MergeInto:

  • Recover the TableProvider via source_as_provider().
  • Extract the source ExecutionPlan from children.
  • Call TableProvider::merge_into(source_plan, on_expr, clauses).

Are these changes tested?

Unit and integration tests are in the parent PR (apache#20763). End-to-end sqllogictests covering a concrete TableProvider::merge_into implementation are planned as follow-up once a reference implementation exists.

Are there any user-facing changes?

TableProvider gains a new method merge_into with a default not_impl_err implementation — existing implementors are unaffected.

Add merge_into async method to TableProvider trait for MERGE INTO
DML support. The method accepts:
- source: ExecutionPlan representing the USING clause
- on: Expr representing the ON join condition
- clauses: Vec<MergeIntoClause> for WHEN MATCHED/NOT MATCHED actions

Default implementation returns not_impl_err for tables that don't
support MERGE INTO operations.
Implement merge_to_plan and merge_clause_to_plan in SQL planner:
- Parse Statement::Merge into LogicalPlan::Dml with WriteOp::MergeInto
- Resolve target table and plan source (USING clause) as LogicalPlan
- Build combined schema for target + source to resolve ON and WHEN expressions
- Convert ON condition and WHEN clauses to DataFusion Expr
- Handle UPDATE, INSERT, and DELETE actions in WHEN clauses

Add physical planner dispatch for WriteOp::MergeInto:
- Use source_as_provider() to recover the TableProvider from the TableSource
- Extract source ExecutionPlan from children
- Call TableProvider::merge_into with source plan, ON condition, and clauses
- Wrap errors with MERGE INTO operation context

Wire MergeInto's expressions through LogicalPlan tree-traversal so
optimizers can rewrite them: add MergeIntoOp::exprs() (stable iteration
order: on, then per-clause predicate + action value Exprs) and
MergeIntoOp::with_new_exprs() to rebuild the op from a transformed
expr vector. Branch LogicalPlan::apply_expressions, map_expressions,
and with_new_exprs on WriteOp::MergeInto to use these helpers; other
WriteOp variants continue to expose no expressions as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant