Skip to content

fix(plan): dedup update from source matches#24932

Merged
mergify[bot] merged 10 commits into
matrixorigin:4.0-devfrom
ck89119:issue-23137
Jun 17, 2026
Merged

fix(plan): dedup update from source matches#24932
mergify[bot] merged 10 commits into
matrixorigin:4.0-devfrom
ck89119:issue-23137

Conversation

@ck89119

@ck89119 ck89119 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23137

What this PR does / why we need it:

This PR fixes the PostgreSQL-style UPDATE ... SET ... FROM ... WHERE ... path when duplicate source rows match the same target row on tables without FK constraints.

The new UPDATE path previously fed duplicate joined rows directly into the update pipeline, which could materialize duplicate primary-key rows for one target row. This change adds a planner-side dedup step for UPDATE ... FROM: group by the target row's original columns and use any_value() for updated columns, matching the existing fallback path behavior.

Test coverage includes:

  • A planner unit test that asserts the new UPDATE path produces AGG + any_value dedup.
  • A BVT case without FK constraints that verifies duplicate source matches still leave each target primary key with exactly one row.

Tested with:

  • git diff --check
  • go test ./pkg/sql/plan -run TestUpdatePgStyleFromDedupsDuplicateSourceMatchesOnNewPath -count=1
  • go test ./pkg/sql/plan -count=1
  • mo-tester update_pg_style_from.sql (108/108 passed)
  • make
  • make static-check (0 issues)

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the new UPDATE ... FROM dedup plan.

pkg/sql/plan/bind_update.go:1029-1041 wraps each updated column in its own any_value(...), and pkg/sql/colexec/aggexec/any2.go:44-53 implements any_value as “first non-NULL seen for this column”. With duplicate source matches, that means different updated columns can be taken from different source rows whenever NULLs are distributed differently across those rows.

Example: if the same target row matches (new_a = NULL, new_b = 'x') and (new_a = 7, new_b = NULL), the dedup can materialize (new_a = 7, new_b = 'x'), even though no matching source row produced that combination. PostgreSQL-style UPDATE ... FROM is allowed to pick an arbitrary matching row, but it still has to pick one row consistently, not synthesize a cross-row tuple. Please dedup on a whole-source-row choice rather than per-column any_value aggregation.

@aunjgr aunjgr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XuPeng-SH's concern is already addressed in the follow-up commits. The PR now uses row_number() + filter for whole-row dedup, not per-column any_value(). Tests assert any_value is absent. The BVT case whole_row_t/whole_row_s covers the exact cross-row synthesis scenario described in the review.

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a real correctness hole here: the PR fixes whole-row dedup only in the new bindUpdate path, but tables that still fall back to buildTableUpdate can still synthesize a row that never existed in any matched source row.

The path is:

  • FK-constrained targets still return UnsupportedDML from dml_context.go and are routed to buildTableUpdate (pkg/sql/plan/dml_context.go:223-224, pkg/sql/plan/build.go:245-248).
  • For any PostgreSQL-style UPDATE ... FROM, that fallback path forces needAggFilter (pkg/sql/plan/build_constraint_util.go:102-108).
  • needAggFilter is still implemented as GROUP BY old target cols + any_value(each updated column) (pkg/sql/plan/build_dml_util.go:2875-2949).

So with duplicate source rows like (new_a = NULL, new_b = 'from-null-a') and (new_a = 7, new_b = NULL), the fallback path can still write (7, 'from-null-a'), even though no joined source row ever had that whole-row combination. This PR's new whole-row protection in appendUpdateFromDedupNode() does not reach that path.

I think the fallback planner needs the same row-level dedup strategy as the new path, not per-column any_value().

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for a correctness issue in the new UPDATE ... FROM dedup key.

Both planner paths now dedup by partitioning on every original target column. That is not a safe proxy for target-row identity:

  1. GEOMETRY32 has no comparator in pkg/compare/compare.go, so Node_PARTITION can build a nil comparator and later dereference it.
  2. float keys use GenericAscCompare, where NaN != NaN, so duplicate source matches against the same target row stop being recognized as duplicates if any partitioned target float column contains NaN.

So this can still either crash or miss the dedup entirely on valid target rows. I think the partition key needs to be a stable row identity (row_id / fake PK), not the whole old row.

Both UPDATE ... FROM dedup paths partitioned the row_number window on every
old target column. That is unsafe as a target-row identity: GEOMETRY32
(T_geometry32) has no comparator in pkg/compare, so Node_PARTITION builds a
nil comparator and crashes; float columns compare NaN != NaN, so duplicate
source matches against a target row holding NaN stop being deduped; and two
distinct rows whose columns happen to be equal get wrongly merged into one.

Partition on the target row's physical identity (row_id) instead, which is
stable, unique, and always comparable. The new bindUpdate path collects each
updated table's row_id position; the fallback buildTableUpdate path uses
offset + rowIdPos. Cover both planner paths with GEOMETRY32 regression UTs and
BVT cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking unhappy-path issue in the fallback UPDATE ... FROM path.

This PR now runs row-number dedup for fallback UPDATE ... FROM and then unconditionally clears updatePlanCtx.needAggFilter. But in the old fallback path, needAggFilter was also responsible for filtering out joined-target NULL rows (row_id IS NOT NULL) for cases like UPDATE t1 LEFT JOIN t2 ... SET ... FROM .... The new row_number() = 1 filter does not restore that safeguard, so one NULL-row per partition can survive and reach the update pipeline for the joined target table.

Please preserve the old NULL-row filter for joined-target fallback updates, or only disable needAggFilter when it was set solely for duplicate-source dedup and not for left/right-join target protection.

ck89119 and others added 2 commits June 16, 2026 17:24
The fallback path's needAggFilter drives two behaviours: an any_value dedup
AGG and an isnotnull(row_id) filter that drops NULL target rows from joined-
target (LEFT/RIGHT JOIN) updates. The row_number() window correctly replaces
the dedup AGG, but unconditionally clearing needAggFilter also killed the
NULL-row filter, allowing a joined-target NULL row to reach the update
pipeline.

Split the two: guard the AGG construction with an additional dedupByRowNumber
check, and move the NULL-row filter outside so it runs whenever needAggFilter
is set regardless of the upstream dedup strategy. The new bindUpdate path was
unaffected — it never sets needAggFilter on plan ctxs.

Coverage: UT asserts any_value absent and isnotnull(row_id) present on the
fallback path; BVT exercises a real LEFT JOIN UPDATE ... FROM where one
target row has no join match.

Co-Authored-By: Claude <noreply@anthropic.com>
Partition(sels, diffs, partitions, vec) in pkg/partition lacked cases for
T_Rowid, T_Blockid, T_TS, and T_decimal256, even though pkg/compare already
supports comparators for all four. A PARTITION node keyed on any of these
types would hit the default panic branch.

The UPDATE ... FROM row_number() dedup partitions on target row_id, so it
triggered this crash at runtime. Add the missing cases using
genericPartition.

Also fix the BVT UPDATE ... FROM LEFT JOIN test: the old result recorded the
partition panic as "expected output" because mo-tester does not distinguish
panic output from query results. Regenerate correct output after the
partition fix. Add two additional tests:
- single-target UPDATE ... FROM with LEFT JOIN on the source side (new
  bindUpdate path).
- UPDATE ... FROM on a table without an explicit PRIMARY KEY.

Co-Authored-By: Claude <noreply@anthropic.com>

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the latest head. The previous joined-target NULL-row concern is resolved: the fallback path now keeps needAggFilter for the row_id IS NOT NULL safeguard and uses a separate dedupByRowNumber flag only to skip the old any_value dedup AGG. I do not see a remaining blocking issue in the updated version.

@mergify

mergify Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

  • Entered queue2026-06-17 03:30 UTC · Rule: release-4.0
  • Checks passed · in-place
  • Merged2026-06-17 04:32 UTC · at eb16fd42a38eca61ea39e775744323343f74466f · squash

This pull request spent 1 hour 2 minutes 6 seconds in the queue, including 1 hour 1 minute 49 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #24932 Review: Fix UPDATE ... FROM Duplicate Source Matches Dedup

🎯 Review Summary

Recommendation: ✅ APPROVE

This PR fixes a critical correctness bug in PostgreSQL-style UPDATE ... SET ... FROM ... WHERE when duplicate source rows match the same target row. The solution is well-designed, thoroughly tested, and addresses the root cause using first-principles thinking.


📋 What This PR Does

Problem: When multiple source rows matched a single target row in UPDATE ... FROM, the old any_value() aggregation approach:

  1. Could synthesize rows that don't exist in the source (mixing columns from different source rows)
  2. Would crash on GEOMETRY32 columns (no comparator in pkg/compare)
  3. Would fail on NaN float values (NaN != NaN breaks dedup recognition)
  4. Would wrongly merge distinct rows with identical column values

Solution: Replace per-column any_value() with row-level dedup via row_number() window partitioned on row_id (target row's stable physical identity), ensuring whole source rows are picked.


✅ Correctness Analysis

1. First-Principles Design

The core insight is brilliant: partition by row identity (row_id), not row values

  • row_id is stable, unique, and always comparable
  • Avoids all type-specific comparison issues (GEOMETRY32, NaN, etc.)
  • Multi-target UPDATE correctly concatenates all row_ids as partition key

2. Dual-Flag Architecture 🏗️

needAggFilter      // Drives BOTH: any_value agg + NULL-row filter
dedupByRowNumber   // When true: skip any_value BUT keep NULL-row filter

Why this works:

  • needAggFilter remains true for UPDATE ... FROM
  • When dedupByRowNumber=true, skip any_value aggregation (line 2877)
  • But still apply NULL-row filter for LEFT JOIN targets (line 2953-2960)
  • Prevents NULL rows from unmatched join targets entering update pipeline

3. Generated Columns

Generated columns are recomputed AFTER dedup (bind_update.go:269-284), ensuring:

  • Base column and generated column come from the SAME source row
  • No cross-row contamination

4. Type Correctness ⚠️→✅

Subtle fix: When dedupByRowNumber=true, rowIdExpr.Typ must come from tableDef.Cols[delCtx.rowIdPos].Typ (line 2932), not from aggNodeProjection (which doesn't exist when agg is skipped).


🧪 Test Coverage Assessment

Planner Unit Tests (407 lines)

  • ✅ Verifies plan structure (row_number window, no any_value agg)
  • ✅ DEFAULT expansion before dedup
  • ✅ GEOMETRY32, decimal256, enum, vector types
  • ✅ Generated columns
  • ✅ Whole-row dedup validation

BVT Tests (209 lines, 108/108 passed)

  • ✅ Duplicate source matches (core fix)
  • ✅ Whole-row integrity (synthesized row detection)
  • ✅ GEOMETRY32 dedup (4 scenarios: new path, fallback path, with/without FK)
  • ✅ LEFT JOIN NULL-row filter preservation
  • ✅ Tables without explicit PK (fake-PK / hidden row_id)
  • ✅ Multi-target UPDATE with LEFT JOIN
  • ✅ Fallback path (FK-constrained tables)

Exceptional test design: Tests don't just check "no crash" but verify semantic correctness (e.g., counting synthesized rows, verifying per-id counts).


🔍 Edge Cases & Unhappy Paths

Edge Case Handling Status
GEOMETRY32 (no comparator) Partition by row_id, not column values ✅ Solved + tested
Float NaN (NaN != NaN) row_id comparison, NaN irrelevant ✅ Solved
Identical-value distinct rows row_id distinguishes them ✅ Solved
No explicit PK Uses hidden __mo_fake_pk_col row_id ✅ Tested
Multi-target UPDATE Partition by all row_ids ✅ Tested
LEFT JOIN NULL targets needAggFilter drives filter even when dedupByRowNumber=true ✅ Tested
FK constraints (fallback path) Both paths get row_number dedup ✅ Tested
Memory allocation errors SetRawBytesAt errors now propagate ✅ Fixed + tested

⚡ Performance Considerations

Trade-off: row_number() window is more expensive than any_value() agg

  • Why acceptable: Correctness > performance. any_value() per-column is fundamentally broken.
  • Optimization: Early exit when partitionColPositions is empty (no dedup needed)
  • Future: Consider hybrid approach only for simple cases, but current solution is safest

🚨 Potential Issues Found

None - All Concerns Resolved ✅

Initial Question: Does multi-target UPDATE with dedupByRowNumber work correctly?
Resolution: Test jt1/jt2/js demonstrates:

  1. jt1.id=2 has no jt2 match → jt2.row_id is NULL for that join result
  2. NULL-row filter removes it from jt2's update pipeline
  3. jt1.id=2 is still updated (both rows in jt1 updated)
  4. jt2 retains exactly 1 row (no NULL-row insertion)

Conclusion: Flag interaction is correct and well-tested.


📊 Code Quality

  • Comments: Excellent. Complex logic clearly explained (e.g., lines 2925-2928 explain dual purpose of needAggFilter)
  • Naming: dedupByRowNumber clearly indicates intent
  • Separation of Concerns: Clean split between dedup mechanism and NULL-row filtering
  • Test Quality: Tests verify plan structure AND semantic correctness
  • Error Handling: Memory allocation errors now properly propagated

🔬 First-Principles Validation

Three fundamental dedup requirements:

  1. Determinism: Same input → same output ✅ (row_number picks first)
  2. Integrity: Pick whole rows, not column salad ✅ (row-level window)
  3. Type Safety: Work for all types ✅ (row_id always comparable)

Why row_id is the right key:

  • Physical identity, not logical equality
  • Stable across all types (including uncomparable ones)
  • Unique per row (no false merges)
  • Always present (explicit or fake PK)

📝 Minor Observations

  1. Partition type support: Added T_decimal256, T_TS, T_Rowid, T_Blockid (partition.go:9-16)

    • Risk: LOW (types already comparable, no specific tests)
    • Recommendation: Consider adding partition-specific tests in future
  2. any_value type support: Extended to decimal256, vectors, geometry, enum (list_agg.go:943-952)

    • Needed for fallback cases where any_value is still used

🎯 Final Verdict

APPROVE

Strengths:

  1. Root cause fix: Solves the fundamental "what should dedup key be" question correctly
  2. Comprehensive testing: 616 lines of new tests covering edge cases
  3. Backward compatibility: Both new and fallback paths work correctly
  4. Type safety: Handles uncomparable types (GEOMETRY32) gracefully
  5. Clear documentation: Comments explain WHY, not just WHAT

No blocking issues found.

Recommended next steps:

  1. Merge this PR
  2. Monitor production for any performance regressions (window vs agg)
  3. Consider documenting the dedup strategy in architecture docs

Review completed with multi-angle analysis:

  • ✅ Correctness (first principles, type safety, semantic integrity)
  • ✅ Edge cases (GEOMETRY32, NaN, NULL-rows, multi-target, FK)
  • ✅ Test coverage (unit + BVT, 108/108 passed)
  • ✅ Performance (acceptable trade-off)
  • ✅ Code quality (clear, well-commented, maintainable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants