Defer ChainMonitor updates and persistence to flush() by joostjager · Pull Request #4351 · lightningdevkit/rust-lightning

joostjager · 2026-01-27T11:34:09Z

Summary

Modify ChainMonitor internally to queue watch_channel and update_channel operations, returning InProgress until flush() is called. This enables persistence of monitor updates after ChannelManager persistence, ensuring correct ordering where the ChannelManager state is never ahead of the monitor state on restart. The new behavior is opt-in via a deferred switch.

Key changes:

ChainMonitor gains a deferred switch to enable the new queuing behavior
When enabled, monitor operations are queued internally and return InProgress
Calling flush() applies pending operations and persists monitors
Background processor updated to capture pending count before ChannelManager persistence, then flush after persistence completes

Performance Impact

Multi-channel, multi-node load testing (using ldk-server chaos branch) shows no measurable throughput difference between deferred and direct persistence modes.

This is likely because forwarding and payment processing are already effectively single-threaded: the background processor batches all forwards for the entire node in a single pass, so the deferral overhead doesn't add any meaningful bottleneck to an already serialized path.

For high-latency storage (e.g., remote databases), there is also currently no significant impact because channel manager persistence already blocks event handling in the background processor loop (test). If the loop were parallelized to process events concurrently with persistence, deferred writing would become comparatively slower since it moves the channel manager round trip into the critical path. However, deferred writing would also benefit from loop parallelization, and could be further optimized by batching the monitor and manager writes into a single round trip.

Alternative Designs Considered

Several approaches were explored to solve the monitor/manager persistence ordering problem:

1. Queue at KVStore level (#4310)

Introduces a QueuedKVStoreSync wrapper that queues all writes in memory, committing them in a single batch at chokepoints where data leaves the system (get_and_clear_pending_msg_events, get_and_clear_pending_events). This approach aims for true atomic multi-key writes but requires KVStore backends that support transactions (e.g., SQLite); filesystem backends cannot achieve full atomicity.

Trade-offs: Most general solution but requires changes to persistence boundaries and cannot fully close the desync gap with filesystem storage.

2. Queue at Persister level (#4317)

Updates MonitorUpdatingPersister to queue persist operations in memory, with actual writes happening on flush(). Adds flush() to the Persist trait and ChainMonitor.

Trade-offs: Only fixes the issue for MonitorUpdatingPersister; custom Persist implementations remain vulnerable to the race condition.

3. Queue at ChainMonitor wrapper level (#4345)

Introduces DeferredChainMonitor, a wrapper around ChainMonitor that implements the queue in a separate wrapper layer. All ChainMonitor traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement.

Trade-offs: Requires re-implementing all trait pass-throughs on the wrapper. Keeps the core ChainMonitor unchanged but adds an external layer of indirection.

ldk-reviews-bot · 2026-01-27T11:34:12Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

joostjager · 2026-01-27T11:39:11Z

Closing this PR as #4345 seems to be the easiest way to go

joostjager · 2026-02-09T14:45:30Z

The single commit was split into three: extracting internal methods, adding a deferred toggle, and implementing the deferral and flushing logic. flush() now delegates to the extracted internal methods rather than reimplementing persist/insert logic inline. Deferred mode is opt-in via a deferred bool rather than always-on. Test infrastructure was expanded with deferred-mode helpers and dedicated unit tests.

codecov · 2026-02-11T11:23:19Z

Codecov Report

❌ Patch coverage is 92.94404% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.97%. Comparing base (62c7575) to head (43bfb1b).
⚠️ Report is 45 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/chain/chainmonitor.rs	90.22%	22 Missing and 4 partials ⚠️
lightning/src/util/test_utils.rs	96.77%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4351      +/-   ##
==========================================
+ Coverage   85.87%   85.97%   +0.09%     
==========================================
  Files         157      159       +2     
  Lines      103769   104868    +1099     
  Branches   103769   104868    +1099     
==========================================
+ Hits        89115    90162    +1047     
- Misses      12158    12198      +40     
- Partials     2496     2508      +12

Flag	Coverage Δ
tests	`85.97% <92.94%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joostjager · 2026-02-12T10:56:15Z

This PR is now ready for review. LDK-node counterpart: lightningdevkit/ldk-node#782

joostjager · 2026-02-17T13:24:32Z

I think there was still a small race remaining. Pushed fix.

ldk-reviews-bot · 2026-02-18T13:40:07Z

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt

feel free to squash

lightning-background-processor/src/lib.rs

lightning/src/chain/chainmonitor.rs

ldk-reviews-bot · 2026-02-19T16:08:50Z

✅ Added second reviewer: @valentinewallace

joostjager · 2026-02-20T08:59:44Z

Rebased without changes

lightning-background-processor/src/lib.rs

lightning/src/chain/chainmonitor.rs

valentinewallace · 2026-02-24T19:27:46Z

lightning/src/chain/chainmonitor.rs

+			match status {
+				ChannelMonitorUpdateStatus::Completed => {
+					let logger = WithContext::from(logger, None, Some(channel_id), None);
+					if let Err(e) = self.channel_monitor_updated(channel_id, update_id) {


Relatedly, the Persist docs state that channel_monitor_updated should be called once a full channelmonitor has been persisted, which I think is inaccurate (and later docs seem to contradict that).

Updated the doc to clarify this. Each pending update ID must be individually marked complete via channel_monitor_updated, since the implementation uses retain to remove only the exact matching ID.

lightning/src/chain/chainmonitor.rs

valentinewallace · 2026-02-24T20:28:39Z

lightning/src/chain/chainmonitor.rs

+			};
+
+			match status {
+				ChannelMonitorUpdateStatus::Completed => {


I seem to recall we started disallowing flipping between returning Completed and InProgress at some point. I'm not sure if that's still correct, but it may be worth looking into.

The flipping concern doesn't apply here. The InProgress is returned by the deferred layer in ChainMonitor, not by the Persist implementation itself. The Persist impl can consistently return Completed; the deferred layer just wraps it with an initial InProgress to delay processing until after the ChannelManager is persisted. When flush() runs and the persister returns Completed, we call channel_monitor_updated to resolve the InProgress that the deferred layer returned earlier.

joostjager · 2026-02-25T08:30:21Z

Review comments addressed: https://github.com/lightningdevkit/rust-lightning/compare/47070bd..c6b30c93e31d462e628070267ec02f20b72452f2

valentinewallace

LGTM!

valentinewallace · 2026-02-25T20:35:36Z

lightning/src/util/test_utils.rs

 use crate::chain::WatchedOutput;
 #[cfg(any(test, feature = "_externalize_tests"))]
 use crate::ln::chan_utils::CommitmentTransaction;
+#[cfg(test)]


I think we can get rid of all these cfg(test) flags?

I gated them because HolderCommitmentTransaction::dummy (used in dummy_monitor) is also cfg(test). But now changed that one too, so that these flags can be dropped.

lightning/src/util/test_utils.rs

lightning/src/chain/chainmonitor.rs

lightning/src/ln/functional_test_utils.rs

valentinewallace

Good to land on my end and save minor feedback for follow-up

valentinewallace · 2026-02-25T21:33:55Z

I guess we should rename #4286 now. This fix is pretty ridiculously simple given the safety it adds. Have we talked to any users about adopting this?

The previous wording implied that persisting a full ChannelMonitor would automatically resolve all pending updates. Reword to make clear that each update ID still needs to be individually marked complete via channel_monitor_updated, even after a full monitor persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract the ChannelMonitor construction boilerplate from channelmonitor test functions into a reusable dummy_monitor helper in test_utils.rs, generic over the signer type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pure refactor: move the bodies of Watch::watch_channel and Watch::update_channel into methods on ChainMonitor, and have the Watch trait methods delegate to them. This prepares for adding deferred mode where the Watch methods will conditionally queue operations instead of executing them immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a `deferred` parameter to `ChainMonitor::new` and `ChainMonitor::new_async_beta`. When set to true, the Watch trait methods (watch_channel and update_channel) will unimplemented!() for now. All existing callers pass false to preserve current behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the unimplemented!() stubs with a full deferred write implementation. When ChainMonitor has deferred=true, Watch trait operations queue PendingMonitorOp entries instead of executing immediately. A new flush() method drains the queue and forwards operations to the internal watch/update methods, calling channel_monitor_updated on Completed status. The BackgroundProcessor is updated to capture pending_operation_count before persisting the ChannelManager, then flush that many writes afterward - ensuring monitor writes happen in the correct order relative to manager persistence. Key changes: - Add PendingMonitorOp enum and pending_ops queue to ChainMonitor - Implement flush() and pending_operation_count() public methods - Integrate flush calls in BackgroundProcessor (both sync and async) - Add TestChainMonitor::new_deferred, flush helpers, and auto-flush in release_pending_monitor_events for test compatibility - Add create_node_cfgs_deferred for deferred-mode test networks - Add unit tests for queue/flush mechanics and full payment flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

joostjager · 2026-02-26T08:16:15Z

I guess we should rename #4286 now. This fix is pretty ridiculously simple given the safety it adds.

Yes, we could reframe #4286 as a performance optimization now.

Have we talked to any users about adopting this?

The plan is to have deferred writes always-on in ldk-node. In the tests that I did, there is no measurable performance downside currently because LDK isn't optimized enough to notice the difference. So from the ldk-node user perspective, the message is simply "force-closures caused by inconsistent persistence are fixed".

TheBlueMatt · 2026-02-26T13:02:02Z

lightning/src/util/test_utils.rs

+///
+/// The `wrap_signer` closure converts the raw `InMemorySigner` into the desired signer type
+/// (e.g. wrapping it in `TestChannelSigner` or passing it through unchanged).
+pub fn dummy_monitor<S: sign::ecdsa::EcdsaChannelSigner + 'static>(


nit: I don't love these kinds of generic "dummy builders" in test_utils.rs. They're pretty narrowly useful for only testing a fake ChannelMonitor in some setups because they may or may not be in a valid state at all. Can we leave this method in channelmontor.rs (or chain)?

I can move it, but then it needs to live outside channelmonitor's test mod. Otherwise the chainmonitor tests can't access it I believe?

TheBlueMatt · 2026-02-26T13:02:56Z

lightning/src/chain/chainmonitor.rs

 	///
 	/// Note that async monitor updating is considered beta, and bugs may be triggered by its use.
 	///
+	/// When `deferred` is `true`, [`chain::Watch::watch_channel`] and


nit Looks like the doc additions were squashed into the wrong commit.

It is a product of the somewhat artificial commit split. The commit that adds deferred was intended to purely mechanically get those changes out of the way, and clearly show that deferred mode isn't enabled anywhere yet.

Giving it an implementation and explaining what it does seemed to belong in the same commit.

TheBlueMatt · 2026-02-26T13:06:07Z

lightning/src/chain/chainmonitor.rs

+	/// A new monitor to insert and persist.
+	NewMonitor { channel_id: ChannelId, monitor: ChannelMonitor<ChannelSigner> },
+	/// An update to apply and persist.
+	Update { channel_id: ChannelId, update: ChannelMonitorUpdate },


Doesn't this introduce the bug you described in #4431 even in the runtime case? In this world, we don't apply an update in-line, and thus can have a pending update when a channel closes. We need to test the crash case for that but also need to look into the async-application case here.

Why would this introduce that bug? The conclusion of #4431 was that it resolves once the close confirms. In the current PR, the async persistence concept remains unchanged, so I'd think it cannot introduce a bug?

There wasn't a crash case btw. Just that the payment remained pending for what I initially thought would be indefinitely, but turned out to be until confirmation.

joostjager mentioned this pull request Jan 27, 2026

Defer ChainMonitor updates and persistence to flush (wrapper approach) #4345

Closed

joostjager closed this Jan 27, 2026

joostjager reopened this Feb 9, 2026

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 1f5cef4 to 30d05ca Compare February 9, 2026 14:45

joostjager force-pushed the chain-mon-internal-deferred-writes branch 9 times, most recently from 2815bf9 to 3eb5644 Compare February 11, 2026 09:37

joostjager force-pushed the chain-mon-internal-deferred-writes branch 10 times, most recently from f964466 to b140bf9 Compare February 12, 2026 08:22

joostjager marked this pull request as ready for review February 12, 2026 10:56

ldk-reviews-bot requested a review from tankyleo February 12, 2026 10:57

joostjager requested a review from TheBlueMatt February 16, 2026 13:39

TheBlueMatt reviewed Feb 19, 2026

View reviewed changes

lightning-background-processor/src/lib.rs Show resolved Hide resolved

lightning/src/chain/chainmonitor.rs Outdated Show resolved Hide resolved

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 7ce7cf9 to 824e976 Compare February 19, 2026 14:52

joostjager requested a review from TheBlueMatt February 19, 2026 14:53

ldk-reviews-bot requested a review from valentinewallace February 19, 2026 16:08

TheBlueMatt removed their request for review February 19, 2026 16:08

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 824e976 to 47070bd Compare February 20, 2026 08:58

valentinewallace reviewed Feb 24, 2026

View reviewed changes

joostjager force-pushed the chain-mon-internal-deferred-writes branch 2 times, most recently from 1ecd046 to c6b30c9 Compare February 25, 2026 08:29

joostjager requested a review from valentinewallace February 25, 2026 10:29

valentinewallace reviewed Feb 25, 2026

View reviewed changes

valentinewallace previously approved these changes Feb 25, 2026

View reviewed changes

joostjager and others added 5 commits February 26, 2026 09:03

Extract shared dummy_monitor helper to test_utils

e3c6a93

Extract the ChannelMonitor construction boilerplate from channelmonitor test functions into a reusable dummy_monitor helper in test_utils.rs, generic over the signer type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

joostjager dismissed valentinewallace’s stale review via 43bfb1b February 26, 2026 08:08

joostjager force-pushed the chain-mon-internal-deferred-writes branch from c6b30c9 to 43bfb1b Compare February 26, 2026 08:08

joostjager requested a review from TheBlueMatt February 26, 2026 10:29

TheBlueMatt reviewed Feb 26, 2026

View reviewed changes

joostjager requested a review from TheBlueMatt February 26, 2026 13:42

Conversation

joostjager commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Impact

Alternative Designs Considered

1. Queue at KVStore level (#4310)

2. Queue at Persister level (#4317)

3. Queue at ChainMonitor wrapper level (#4345)

Uh oh!

ldk-reviews-bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jan 27, 2026

Uh oh!

joostjager commented Feb 9, 2026

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joostjager commented Feb 12, 2026

Uh oh!

joostjager commented Feb 17, 2026

Uh oh!

ldk-reviews-bot commented Feb 18, 2026

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ldk-reviews-bot commented Feb 19, 2026

Uh oh!

joostjager commented Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager commented Feb 25, 2026

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

valentinewallace commented Feb 25, 2026

Uh oh!

joostjager commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

joostjager commented Jan 27, 2026 •

edited

Loading

ldk-reviews-bot commented Jan 27, 2026 •

edited

Loading

codecov bot commented Feb 11, 2026 •

edited

Loading

joostjager Feb 26, 2026 •

edited

Loading