Skip to content

fix: merge strategy is memory intensive#9054

Open
k-anshul wants to merge 2 commits intomainfrom
fix_merge_strategy
Open

fix: merge strategy is memory intensive#9054
k-anshul wants to merge 2 commits intomainfrom
fix_merge_strategy

Conversation

@k-anshul
Copy link
Member

@k-anshul k-anshul commented Mar 13, 2026

closes https://linear.app/rilldata/issue/PLAT-437/improve-oomes-in-duckdb-when-ingesting-partitions-with-merge-strategy

  1. The underlying assumption that incoming partition is small is not always true. When creating a big memory table duckdb generates big tmp files (probably because it is never compressed). When ingesting from that temporary table to disk table, it starts generating more tmp files (probably because there is no free memory).
  2. In one of the customer's dataset, their existing dataset has ~55 mil rows, they are trying to pull ~90 mil rows in one partition.
  3. The delete and insert statements executed in merge are probably not intensive. I tried by copying a big table and doing delete followed by insert. Both executed pretty fast, did not generate any tmp files if preserve_insertion_order is false.

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@k-anshul k-anshul self-assigned this Mar 13, 2026
@k-anshul k-anshul requested a review from begelundmuller March 13, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant