Search before asking
Description
Background
The CompletedSnapshotStoreManager in Fluss coordinator has significant memory overhead due to storing absolute paths repeatedly for snapshot metadata, which can lead to OOM errors in production clusters with many table buckets.
Current Implementation Issues
Currently, snapshot-related paths are stored as absolute paths in multiple places:
CompletedSnapshot.snapshotLocation: Each snapshot stores a complete absolute path
KvFileHandle.filePath: Each KV file stores a complete absolute path
This creates severe memory redundancy:
- For snapshots belonging to the same
TableBucket, their snapshotLocation values differ only in the final snapshot ID, while the base path prefix (e.g., hdfs://namenode:8020/fluss/kv/db1/table1-100/0/) is identical across all snapshots
- Each
KvFileHandle within a snapshot also stores the complete absolute file path, including highly repetitive path prefixes
- In a typical scenario: with a 120-byte base path, 10 retained snapshots per bucket, and 100 files per snapshot, path prefixes alone consume over 100KB of memory per bucket
- When multiplied across thousands of table buckets in a production cluster, this leads to multi-GB memory overhead and eventual OOM
Memory Impact Analysis
Before optimization (absolute paths):
Per snapshot:
- snapshotLocation: ~120 bytes (full path)
- 100 KvFileHandles × ~150 bytes each = ~15KB
Total per snapshot: ~15.12KB
Per bucket (10 snapshots): ~151KB
1000 buckets: ~151MB (paths only, excluding other metadata)
10000 buckets: ~1.5GB (paths only)
Willingness to contribute
Search before asking
Description
Background
The
CompletedSnapshotStoreManagerin Fluss coordinator has significant memory overhead due to storing absolute paths repeatedly for snapshot metadata, which can lead to OOM errors in production clusters with many table buckets.Current Implementation Issues
Currently, snapshot-related paths are stored as absolute paths in multiple places:
CompletedSnapshot.snapshotLocation: Each snapshot stores a complete absolute pathKvFileHandle.filePath: Each KV file stores a complete absolute pathThis creates severe memory redundancy:
TableBucket, theirsnapshotLocationvalues differ only in the final snapshot ID, while the base path prefix (e.g.,hdfs://namenode:8020/fluss/kv/db1/table1-100/0/) is identical across all snapshotsKvFileHandlewithin a snapshot also stores the complete absolute file path, including highly repetitive path prefixesMemory Impact Analysis
Before optimization (absolute paths):
Willingness to contribute