Skip to content

Roll back composite sub-handlers when one rejects peer_connected#4595

Open
tnull wants to merge 1 commit intolightningdevkit:mainfrom
tnull:2026-05-composite-handler-peer-connected-rollback
Open

Roll back composite sub-handlers when one rejects peer_connected#4595
tnull wants to merge 1 commit intolightningdevkit:mainfrom
tnull:2026-05-composite-handler-peer-connected-rollback

Conversation

@tnull
Copy link
Copy Markdown
Contributor

@tnull tnull commented May 5, 2026

composite_custom_message_handler! expanded peer_connected to call every sub-handler and remember the last error, but never undo the already-succeeded ones. The CustomMessageHandler::peer_connected contract is that PeerManager will not invoke peer_disconnected when peer_connected returns Err — so any per-peer state allocated by an earlier sub-handler that returned Ok was leaked permanently once a later sub-handler returned Err.

A peer who can elicit Err from any sub-handler in the composite (feature-bit gate, banlist, etc.) could repeatedly reconnect to grow that leaked state without bound (slow resource DoS), and "currently connected" predicates in the leaking sub-handler would lie about peers that were actually rejected.

Mirror the rollback pattern PeerManager already uses for the four built-in handlers (peer_handler.rs:2149-2188): record each sub-handler's peer_connected result, and if any returned Err, call peer_disconnected on the ones that succeeded before propagating the failure.

Co-Authored-By: HAL 9000

`composite_custom_message_handler!` expanded `peer_connected` to call
every sub-handler and remember the last error, but never undo the
already-succeeded ones. The `CustomMessageHandler::peer_connected`
contract is that `PeerManager` will *not* invoke `peer_disconnected`
when `peer_connected` returns `Err` — so any per-peer state allocated
by an earlier sub-handler that returned `Ok` was leaked permanently
once a later sub-handler returned `Err`.

A peer who can elicit `Err` from any sub-handler in the composite
(feature-bit gate, banlist, etc.) could repeatedly reconnect to grow
that leaked state without bound (slow resource DoS), and "currently
connected" predicates in the leaking sub-handler would lie about
peers that were actually rejected.

Mirror the rollback pattern `PeerManager` already uses for the four
built-in handlers (`peer_handler.rs:2149-2188`): record each
sub-handler's `peer_connected` result, and if any returned `Err`,
call `peer_disconnected` on the ones that succeeded before
propagating the failure.

Co-Authored-By: HAL 9000
@tnull tnull requested a review from jkczyz May 5, 2026 19:16
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 I see @jkczyz was un-assigned.
If you'd like another reviewer assignment, please click here.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

I've completed a thorough review of the entire diff. The macro change and the test are both correct.

Analysis of the core logic:

  • The macro binds each sub-handler's peer_connected result to a local variable (let $field = ...), then checks if any returned Err, and calls peer_disconnected only on the ones that returned Ok. This correctly prevents leaking per-peer state.
  • The peer_disconnected implementation (unchanged, line 308-312) calls all sub-handlers unconditionally, which is correct for the normal path where PeerManager calls it after a successful peer_connected.
  • The rollback correctly skips peer_disconnected for handlers that returned Err, matching the trait contract that those handlers are responsible for their own cleanup.
  • Macro hygiene is fine: $field (call-site context) and any_err (macro context) are in different hygiene scopes in macro_rules!, so field names can't collide with the macro-introduced variable.
  • The original code also called all sub-handlers (no short-circuit), so the new code is consistent in that regard — it just adds the rollback that was missing.

Test analysis:

  • The test correctly verifies that a CountingHandler (always succeeds) has its state rolled back when composed with an ErroringHandler (always fails).
  • The atomic counter correctly tracks connect/disconnect balance.

No issues found.

fn get_and_clear_pending_msg(&self) -> Vec<(PublicKey, Bar)> {
vec![]
}
fn peer_disconnected(&self, _: PublicKey) {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert!(false) here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 38.55422% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.15%. Comparing base (1a26867) to head (7de6891).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
lightning-custom-message/src/lib.rs 38.55% 49 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4595      +/-   ##
==========================================
- Coverage   86.84%   86.15%   -0.69%     
==========================================
  Files         161      160       -1     
  Lines      109260   109304      +44     
  Branches   109260   109304      +44     
==========================================
- Hits        94882    94174     -708     
- Misses      11797    12517     +720     
- Partials     2581     2613      +32     
Flag Coverage Δ
fuzzing-fake-hashes ?
fuzzing-real-hashes ?
tests 86.15% <38.55%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from needing to add the debug_assert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants