Skip to content

EXPORT PARTITION to Iceberg OOMs under small server memory limit #1979

Description

@DimensionWieldr

Describe the bug

EXPORT PARTITION fails with memory limit exceeded on a production-scale MergeTree table when replica memory is insufficient (RAM 6 GB). The export coordinator retries many times and eventually marks the partition export FAILED, even though the OOM sometimes occurs on a relatively small part after larger parts in the same partition were already processed.

The same partition export succeeds after upgrading the replica (RAM 14 GB).

To Reproduce

  1. Create an Iceberg destination table (ice-rest-catalog)

  2. Set session settings:

SET allow_experimental_export_merge_tree_part = 1;
SET write_full_path_in_iceberg_metadata = 1;
SET export_merge_tree_part_allow_lossy_cast = 1;
  1. Run partition export on a large partition:
ALTER TABLE mergetree.dns_mtree_repl_v2_0
EXPORT PARTITION ID '20251017-182'
TO TABLE ice.`poc.dns_mtree_repl_v2_0_phase3`;
  1. Monitor:
SELECT partition_id, status, exception_count, last_exception_per_replica
FROM system.replicated_partition_exports
WHERE destination_table = 'poc.dns_mtree_repl_v2_0_phase3'
ORDER BY create_time DESC;

Expected behavior
EXPORT PARTITION succeeds.

Key information

  • 26.3.12.20001.altinityantalya
  • 1 shard, 1 replica

Additional context

Failing Partition: 20251017-182
Rows: 603,839,595
On-disk size: 40.6 GB
Parts: 8 active parts

│ Part name                │ Rows        │ Size      │
│ 20251017-182_2103_2264_3 │ 155,213,824 │ 10.25 GiB │
│ 20251017-182_2265_2424_3 │ 153,255,936 │ 10.20 GiB │
│ 20251017-182_2425_2569_3 │ 138,862,592 │ 9.42 GiB  │
│ 20251017-182_2570_2710_3 │ 135,120,638 │ 9.28 GiB  │
│ 20251017-182_2717_2722_1 │ 5,742,085   │ 406 MiB   │
│ 20251017-182_2723_2728_1 │ 5,737,561   │ 403 MiB   │
│ 20251017-182_2711_2716_1 │ 5,755,860   │ 373 MiB   │
│ 20251017-182_2729_2733_1 │ 4,151,099   │ 308 MiB   │

Error:

(total) memory limit exceeded: would use 3.56 GiB (attempt to allocate chunk of 4.40 MiB),
  current RSS: 4.59 GiB, maximum: 4.59 GiB:
  (while reading column qname):
  (while reading from part /var/lib/clickhouse/store/.../20251017-182_2717_2722_1/
   in table mergetree.dns_mtree_repl_v2_0 (...) located on disk default of type local,
   from mark 0 with max_rows_to_read = 16384, offset = 0):
  While reading part 20251017-182_2717_2722_1:
  While executing MergeTreeSequentialSource

FYI, I also hit OOM on a single very large part with around 700 million rows and size 48 GB.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions