[core][flink] Manifest cache benchmarks + expose more manifest cache options by mao-liu · Pull Request #8186 · apache/paimon

mao-liu · 2026-06-10T01:50:34Z

Purpose

In Paimon v1.3 (prior to 960dce1), manifest cache incurred significant heap memory spike during cold-filling. This problem was raised and discussed in #7030 and #7031. This problem is particularly evident for highly partitioned tables in jobs with high parallelism.

While the heap spike issue is mostly resolved via 960dce1, some additional manifest cache options are proposed here to help tune the manifest cache for highly partitioned tables in jobs with high parallelism.

When many high-parallelism writers restore at the same time, the Job Manager's manifest cache can become a memory bottleneck. The cache holds entries with soft references, so under sustained heap pressure the JVM reclaims entries that are then immediately re-read and decompressed, driving heap back up and triggering further reclamation — a cache-thrash spiral. There was previously no way to tune this behavior.

This PR exposes additional manifest-cache controls and a prefetch option to make this tunable:

Added WriteRestoreScanBenchmark, a micro-benchmark that reproduces the manifest-cache cold-fill memory spike and reports heap/cache footprint across cache-disabled vs. cache-enabled (strong-ref) arms. On Paimon v1.3, this benchmark would reveal significant memory heap spike during cold-filling on the cache-enabled path. This problem is no longer present after 960dce1, however the benchmark could still be useful in measuring performance and detecting regression in the future.
SegmentsCache now supports a configurable idle TTL (expire-after-access) and a soft-values toggle. Setting soft-values=false pins the working set with strong references so the thrash spiral cannot start; the cache then stays bounded by weight (up to its configured memory). The defaults preserve the existing behavior (soft references on).
New catalog option:
- cache.manifest.soft-values (default true) — toggle soft/strong references for the catalog manifest cache. The catalog manifest cache continues to inherit the catalog-wide cache.expire-after-access TTL.
New writer-coordinator options:
- sink.writer-coordinator.cache-soft-values (default true) — same soft/strong reference toggle for the coordinator manifest cache.
- sink.writer-coordinator.cache-expire-after-access (default disabled) — optional idle TTL for coordinator cache entries; the cache stays bounded by sink.writer-coordinator.cache-memory regardless.
- sink.writer-coordinator.prefetch-manifests (default false) — eagerly read all data manifests of the latest snapshot during refresh to warm the in-Job-Manager manifest cache once, avoiding many concurrent cold manifest reads when writers restore simultaneously.
Docs: documented the new options and added a "Write Initialize" section in write-performance.md explaining when these settings help, the failure mechanism, and how they resolve it.

Tests

SegmentsCacheTest: covers defaults (soft refs on, no TTL), getter pass-through, create returning null on zero memory, and that strong references stay bounded by weight-based eviction.
CachingCatalogTest#testManifestCacheOptions: asserts the catalog manifest cache picks up soft-values and inherits the catalog idle TTL.
TableWriteCoordinatorTest: testBuildManifestCacheOptions verifies the coordinator options map to the cache (default soft refs + no TTL, explicit TTL honored, soft-values=false switches to strong refs, zero memory disables the cache); testPrefetchManifestsWarmsCache verifies that constructing the coordinator with prefetch enabled warms the cache and that scan results remain correct.
Regenerated config docs verified by ConfigOptionsDocsCompletenessITCase.

Closes #7030

Remove cache page size changes - not needed Tidying up

JingsongLi · 2026-06-10T09:24:57Z

+            // that per-task `scan` requests use, so subsequent concurrent requests hit
+            // warm bytes instead of each performing a cold manifest read.
+            scan.withPartitionFilter(PartitionPredicate.ALWAYS_TRUE)
+                    .withBucketFilter(Filter.alwaysTrue())


This prefetch reuses the mutable coordinator scan. A normal request calls scan.withPartitionBucket(partition, bucket), which leaves specifiedBucket set in AbstractFileStoreScan/ManifestsReader; withBucketFilter(Filter.alwaysTrue()) only adds a permissive filter and does not clear that field. After the next checkpoint refresh, this plan can therefore skip manifests outside the last requested bucket range instead of warming all data manifests. Please use a fresh table.store().newScan().withSnapshot(snapshot) for the prefetch, or add an explicit way to clear the bucket state, and cover the scan-then-checkpoint case in the test.

Thanks for the review @JingsongLi - and thanks for catching the bug 🙏

I have updated prefetch to use a fresh scan instance, and also added a test to cover this scenario. - 709c79b

JingsongLi

LGTM. The previous prefetch issue is fixed with a fresh scan instance and covered by the new test.

Expose more manifest cache options + benchmarks

1643c02

Remove cache page size changes - not needed Tidying up

mao-liu changed the title ~~[core] [flink] Manifest cache benchmarks + expose more manifest cache options~~ [core][flink] Manifest cache benchmarks + expose more manifest cache options Jun 10, 2026

spotless

ca0b76e

JingsongLi reviewed Jun 10, 2026

View reviewed changes

fix cache prefetch with new scan instance

709c79b

mao-liu force-pushed the feat/manifest-cache-options branch from 5ddd680 to 709c79b Compare June 11, 2026 00:39

JingsongLi approved these changes Jun 11, 2026

View reviewed changes

JingsongLi merged commit 5d26928 into apache:master Jun 11, 2026
13 checks passed

mao-liu deleted the feat/manifest-cache-options branch June 11, 2026 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][flink] Manifest cache benchmarks + expose more manifest cache options#8186

[core][flink] Manifest cache benchmarks + expose more manifest cache options#8186
JingsongLi merged 3 commits into
apache:masterfrom
mao-liu:feat/manifest-cache-options

mao-liu commented Jun 10, 2026 •

edited

Loading

Uh oh!

JingsongLi Jun 10, 2026 •

edited

Loading

Uh oh!

mao-liu Jun 11, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mao-liu commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

JingsongLi Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mao-liu Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mao-liu commented Jun 10, 2026 •

edited

Loading

JingsongLi Jun 10, 2026 •

edited

Loading