Skip to content

Don't persist inbound committed onions in prod#4599

Open
valentinewallace wants to merge 1 commit intolightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet
Open

Don't persist inbound committed onions in prod#4599
valentinewallace wants to merge 1 commit intolightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet

Conversation

@valentinewallace
Copy link
Copy Markdown
Contributor

A few PRs ago, we started persisting inbound committed HTLC onions in Channels. These onions were persisted to lay groundwork for reconstructing the ChannelManager's pending HTLC maps from them in a future version. However, we've since discovered a different direction where we can instead reconstruct these same maps using persistent monitor events, which may mean that we don't need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment that we're not fully confident in yet, switch it off for now until we either confirm the monitor events direction and can delete all this onion persisting code OR realize that we definitely do need it.

@valentinewallace valentinewallace added this to the 0.3 milestone May 5, 2026
@valentinewallace valentinewallace self-assigned this May 5, 2026
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label May 5, 2026
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 I see @wpaulino was un-assigned.
If you'd like another reviewer assignment, please click here.

Comment on lines +15594 to +15597
&InboundHTLCState::Committed { update_add_htlc: ref _update_add } => {
3u8.write(writer)?;
inbound_committed_update_adds.push(update_add_htlc);
#[cfg(test)]
inbound_committed_update_adds.push(_update_add);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The write side is now gated by #[cfg(test)], so in production inbound_committed_update_adds remains empty and TLV field 75 is never written. However, on the read side (line 16554), inbound_committed_update_adds_opt is initialized to Some(Vec::new()) by the optional_vec macro (when the TLV is absent, it stays as Some(empty_vec)). The deserialization code at line 16554-16564 then enters the if let Some(update_adds) block unconditionally and calls iter.next().ok_or(DecodeError::InvalidValue)? for each committed HTLC — which immediately fails because the iterator is empty.

Impact: Any production node that restarts while it has committed inbound HTLCs will fail to deserialize its channel state with DecodeError::InvalidValue. This is a critical availability bug.

Fix: The deserialization code at line 16554 needs a corresponding guard. For example:

if let Some(update_adds) = inbound_committed_update_adds_opt {
    if !update_adds.is_empty() {
        let mut iter = update_adds.into_iter();
        // ... existing code ...
    }
}

Or gate the entire read-side block with #[cfg(test)] to match the write side.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 5, 2026

After thorough re-examination, I now realize my prior critical bug report was incorrect. Here's why:

The variable inbound_committed_update_adds_opt is initialized to None at line 16372 in the manual ReadableArgs impl for FundedChannel. This is different from the auto-generated macro path (_init_tlv_field_var) which would initialize it to Some(Vec::new()). When TLV 75 is absent (production case after this PR), the variable stays None, and the if let Some(update_adds) guard at line 16554 correctly skips the block. Committed inbound HTLCs keep their default InboundUpdateAdd::Legacy state, which is the intended behavior.

Review Summary

Correction to prior review: The critical deserialization bug I flagged previously is a false positive. The FundedChannel ReadableArgs impl manually initializes inbound_committed_update_adds_opt to None (line 16372), not Some(Vec::new()) as the auto-generated optional_vec macro would. When TLV 75 is absent, the if let Some(...) guard at line 16554 is correctly not entered.

No new issues found. The PR correctly:

  1. Gates the write of TLV 75 data behind #[cfg(test)] so production nodes stop persisting onion data
  2. Uses #[cfg_attr(not(test), allow(unused_mut))] to suppress the unused_mut warning in production
  3. Uses _-prefixed variable name to suppress unused variable warnings in production
  4. Maintains test coverage by still writing the data in test mode
  5. Has consistent read-side behavior: absent TLV 75 means None means all committed HTLCs stay Legacy
  6. Is safe for the reconstruct_manager_from_monitors path since that's gated on version >= 2 in production

@ldk-reviews-bot ldk-reviews-bot requested a review from wpaulino May 5, 2026 22:08
A few PRs ago, we started persisting inbound committed HTLC onions in Channels.
These onions were persisted to lay groundwork for reconstructing the
ChannelManager's pending HTLC maps from them in a future version. However,
we've since discovered a different direction where we can instead reconstruct
these same maps using persistent monitor events, which may mean that we don't
need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment
that we're not fully confident in yet, switch it off for now until we either
confirm the monitor events direction and can delete all this onion persisting
code OR realize that we definitely do need it.
@valentinewallace valentinewallace force-pushed the 2026-05-no-persist-onions-yet branch from c50518a to d68dbf8 Compare May 5, 2026 22:11
@valentinewallace valentinewallace removed the request for review from wpaulino May 5, 2026 22:16
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.28%. Comparing base (b64efcd) to head (d68dbf8).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4599      +/-   ##
==========================================
+ Coverage   86.22%   86.28%   +0.05%     
==========================================
  Files         159      159              
  Lines      109170   109269      +99     
  Branches   109170   109269      +99     
==========================================
+ Hits        94136    94280     +144     
+ Misses      12424    12379      -45     
  Partials     2610     2610              
Flag Coverage Δ
tests 86.28% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

weekly goal Someone wants to land this this week

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants