[server] Add disk-usage write protection to TabletServer by swuferhong · Pull Request #3340 · apache/fluss

swuferhong · 2026-05-18T02:53:58Z

Purpose

Linked issue: close #3338

Introduce a periodic disk-usage monitoring mechanism that proactively rejects client writes when the TabletServer's data disk usage exceeds a configurable high-water-mark ratio, preventing ENOSPC errors and potential data corruption.

Key design decisions:

Hysteresis state machine with a fixed 10% recovery gap to avoid rapid lock/unlock oscillation (lock at limit, unlock at limit-0.10)
Max-per-disk strategy: report the highest usage across all distinct FileStores so a single full disk is never masked by other low-usage disks in multi-disk deployments
Only client-driven writes (appendLog/putKv) are rejected with a retriable DiskWriteLockedException; follower replication is not blocked to preserve replica consistency
write-limit-ratio supports runtime dynamic reconfiguration via ServerReconfigurable, with an immediate re-check on change
Setting ratio to 1.0 completely disables the protection

New configuration:

server.data-disk.write-limit-ratio (default 0.85, dynamic)
server.data-disk.check-interval (default 30s)

New metrics:

diskUsageRatio: current disk usage ratio [0.0, 1.0]
diskWriteLocked: 1 when writes are being rejected, 0 otherwise

Brief change log

Tests

API and Format

Documentation

zuston

If exceeding the disk usage ratio threshold (or disk corruption), do we need to make this tablet server as offline or unhealthy status? I think the writer side fencing is not enough, sometimes the disk usage exceeding will not recover automaticlly at the many cases

swuferhong · 2026-05-18T05:15:50Z

If exceeding the disk usage ratio threshold (or disk corruption), do we need to make this tablet server as offline or unhealthy status? I think the writer side fencing is not enough, sometimes the disk usage exceeding will not recover automaticlly at the many cases

Hi, @zuston. Writer-side fencing is the minimum-sufficient response for a capacity event; promoting it to node-level offline turns a localized capacity problem into a cluster-wide availability incident and triggers cascading failover. Disk corruption is a separate fault domain (IOException-driven Log Directory Failure) and should be addressed in a dedicated PR.

Happy to add a follow-up issue tracking the Log Directory Failure work if that helps.

wuchong

Thanks @swuferhong , I only left some minor comments.

wuchong · 2026-05-23T12:00:55Z

                            KV_SHARED_RATE_LIMITER_BYTES_PER_SEC.key(),
-                            KV_SNAPSHOT_INTERVAL.key()));
+                            KV_SNAPSHOT_INTERVAL.key(),
+                            SERVER_DATA_DISK_WRITE_LIMIT_RATIO.key()));


This key now passes the coordinator allowlist, but the range check still exists only in LocalDiskManager.validate(), which is registered on TabletServer, not CoordinatorServer. Values like 0.0 or 1.5 can therefore be persisted through AlterConfigs and only fail later when tablet servers try to apply them. The coordinator path should reject invalid server.data-disk.write-limit-ratio updates up front.

I think we should also validate this on the Coordinator via org.apache.fluss.server.DynamicConfigManager#registerValidator by extending a ConfigValidator. We should also add an IT case for setting valid and invalid server.data-disk.write-limit-ratio (maybe near FlussAdminITCase#testDynamicConfigs()).

wuchong · 2026-05-23T12:00:55Z

+                if (total <= 0L) {
+                    continue;
+                }
+                double ratio = (double) (total - fs.getUsableSpace()) / total;


collect() only treats Files.getFileStore() failures as skippable. If FileStore#getTotalSpace() or getUsableSpace() throws for one data directory, the whole sample aborts instead of skipping just that directory, so DiskUsageMonitor can keep a stale lock state even when the other data dirs are still healthy and measurable. This should handle per-filesystem stat failures the same way as lookup failures.

wuchong · 2026-05-23T12:02:09Z

+        diskUsageMonitor.runOnce();
+        scheduler.schedule(
+                "disk-usage-monitor",
+                diskUsageMonitor::runOnce,
+                diskCheckIntervalMs,
+                diskCheckIntervalMs);


nit: Setting delayMs to 0 can trigger immediate collection, rather than relying on an explicit invocation. This ensures the disk I/O operation executes asynchronously within the scheduler thread, preventing it from blocking the startup process.

swuferhong force-pushed the disk-usage-protect branch 2 times, most recently from 5949356 to ac209de Compare May 18, 2026 03:16

zuston reviewed May 18, 2026

View reviewed changes

swuferhong force-pushed the disk-usage-protect branch from ac209de to feb7dc6 Compare May 18, 2026 03:36

swuferhong added 2 commits May 20, 2026 14:59

[server] Add disk-usage write protection to TabletServer

5cc2dcb

add debug log for recovery

44c6c93

swuferhong force-pushed the disk-usage-protect branch from feb7dc6 to 44c6c93 Compare May 20, 2026 09:25

wuchong reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server] Add disk-usage write protection to TabletServer#3340

[server] Add disk-usage write protection to TabletServer#3340
swuferhong wants to merge 2 commits into
apache:mainfrom
swuferhong:disk-usage-protect

swuferhong commented May 18, 2026

Uh oh!

zuston left a comment

Uh oh!

swuferhong commented May 18, 2026

Uh oh!

wuchong left a comment

Uh oh!

wuchong May 23, 2026

Uh oh!

wuchong May 23, 2026

Uh oh!

wuchong May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

swuferhong commented May 18, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

zuston left a comment

Choose a reason for hiding this comment

Uh oh!

swuferhong commented May 18, 2026

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong May 23, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong May 23, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants