Skip to content

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526

Open
f3r10 wants to merge 5 commits intolightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state
Open

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526
f3r10 wants to merge 5 commits intolightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state

Conversation

@f3r10
Copy link
Copy Markdown

@f3r10 f3r10 commented Mar 31, 2026

Closes #4444.

Add explicit pruning APIs to LSPS1ServiceHandler, LSPS2ServiceHandler, and LSPS5ServiceHandler so LSP operators can reclaim memory from completed historical state.

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Mar 31, 2026

👋 I see @valentinewallace was un-assigned.
If you'd like another reviewer assignment, please click here.

///
/// Returns an [`APIError::APIMisuseError`] if the counterparty has no state, the order is
/// unknown, or the order is in a non-terminal state.
pub async fn prune_order(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is the right API to require downstream code list the specific orders to prune or should we have some kind of "prune all failed orders older than X" API? @tnull wdyt?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @TheBlueMatt, it would be much better something like this:
handler.prune_orders(client_id, Duration::from_secs(30 * 24 * 3600)).await?;

I am going to update that part. Thanks for the early review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, allowing to prune older entries in an interval is great. At some point we still might want to expose the LSPS{1,2} service state via the API, at which point allowing to drop specific orders would make sense, but not sure if that's not better done in a dedicated follow-up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better in a follow-up when service state listing exists

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready the new method using an interval @tnull @TheBlueMatt

Add `prune_orders(counterparty_node_id, max_age: Duration)` to both
`LSPS1ServiceHandler` and `LSPS1ServiceHandlerSync`. It removes all
terminal orders (`CompletedAndChannelOpened` / `FailedAndRefunded`) for
a given peer that are at least `max_age` old, persists the updated state,
and returns the number of entries removed.

Passing `Duration::ZERO` prunes all terminal orders immediately regardless
of age, which is the recommended approach to unblock a client that has
hit the per-peer request limit due to accumulated failed orders.

On the `PeerState` layer, `prune_terminal_orders(now, max_age)` uses
`retain` for a single-pass removal and sets `needs_persist` only when
at least one entry is removed.
@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from 25a876b to ebaa7ed Compare April 8, 2026 17:40
@f3r10
Copy link
Copy Markdown
Author

f3r10 commented Apr 8, 2026

Trying to add the API for LSPS2, using the same strategy as LSPS1 (using intervals), I realize it is not quite possible because OutboundJITChannel has no created_at timestamp.
It would only be possible to create a function like this prune_channel(counterparty_node_id, intercept_scid), but it would be necessary to list all scids

The solution would be to add created_at: LSPSDateTime to OutboundJITChannel (new TLV field — backwards-compatible since missing fields default on read).

My question is, should this change be done in this same PR or in a follow-up @tnull @TheBlueMatt wdyt?

@f3r10 f3r10 requested review from TheBlueMatt and tnull April 8, 2026 17:49
@TheBlueMatt
Copy link
Copy Markdown
Collaborator

Yea, makes sense to add a created-at timestamp in LSPS2 requests imo.

Add a `created_at: LSPSDateTime` field to `OutboundJITChannel` to
record when each JIT channel was created (i.e., when the buy request
was accepted by the LSP). This timestamp is needed to implement
time-based bulk pruning of completed channel state.

The field is persisted as TLV type 10 with a `default_value` of
Unix epoch, ensuring old serialized data (without TLV 10) is read
back successfully with the epoch sentinel rather than failing
deserialization.
@f3r10 f3r10 marked this pull request as ready for review April 16, 2026 16:00
(4, opening_fee_params, required),
(6, payment_size_msat, option),
(8, trust_model, required),
(10, created_at, (default_value, LSPSDateTime::new_from_duration_since_epoch(Duration::ZERO))),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The default_value of epoch (Duration::ZERO = 1970-01-01T00:00:00Z) means that any pre-existing OutboundJITChannel deserialized from old persisted state (before this field existed) will have created_at = epoch. When prune_channels is called, now.duration_since(epoch) will return ~54 years, so any max_age shorter than that will prune all pre-upgrade terminal channels. This silently breaks the age-filter semantics for operators who upgrade and then call prune_channels expecting only old-enough channels to be pruned.

Consider using a sentinel value (e.g., Option<LSPSDateTime>) and skipping the age check for channels without a created_at, or document this prominently so operators know the first post-upgrade prune with max_age > 0 will still prune all legacy terminal channels.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggestion. I am going to convert that field to Option<LSPSDateTime>
In this context, None would mean "unknown age" (pre-upgrade data). In prune_terminal_channels:

  • Duration::ZERO → prunes all terminal channels including legacy ones
  • Any positive max_age → skips channels with created_at = None, only prunes those with a known created_at old enough

Comment on lines +684 to +685
let should_prune = max_age == Duration::ZERO
|| now.duration_since(&channel.created_at) >= max_age;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (pre-existing, but used by new code): LSPSDateTime::duration_since uses abs_diff, so it returns the absolute time difference. If now < created_at (e.g., due to clock skew or time adjustments), this returns a positive duration equal to the gap, which could exceed max_age and cause a channel to be incorrectly pruned even though it was just created.

For age-based pruning, a saturating subtraction (return Duration::ZERO when now < created_at) would be correct — an entry created "in the future" relative to now should never be considered old enough to prune. Same issue applies to the LSPS1 usage in prune_terminal_orders.

let mut outer_state_lock = self.per_peer_state.write().unwrap();
match outer_state_lock.get_mut(&counterparty_node_id) {
Some(peer_state) => {
if !peer_state.remove_webhook(app_name) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove_webhook unconditionally sets needs_persist |= true (line 822 of peer state) even when no webhook is found. When prune_webhook is called with a non-existent app_name, remove_webhook returns false and this method correctly returns an error — but the needs_persist flag has already been set, causing a needless persistence cycle. Consider moving the needs_persist assignment inside the retain's removal branch in remove_webhook.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented Apr 16, 2026

Review Summary

All three issues from my prior review have been addressed:

  1. created_at is now Option<LSPSDateTime> with TLV option (was default_value epoch) — pre-upgrade channels correctly skip age-based filtering
  2. LSPSDateTime::duration_since now uses saturating_sub instead of abs_diff — clock skew can no longer cause spurious pruning
  3. remove_webhook only sets needs_persist when a webhook was actually removed

New inline comment posted

  • lightning-liquidity/src/lsps2/service.rs:3126-3136 — Unit test manually reimplements pruning logic instead of calling prune_terminal_channels, meaning it won't catch bugs in the actual method

Cross-cutting observations

  • Persistence inconsistency across handlers: LSPS1 prune_orders and LSPS2 prune_channels both call persist_peer_state() immediately after pruning. LSPS5 prune_webhook does not — it defers to the next LiquidityManager::persist call. This is documented but means LSPS5 state could be lost on crash between pruning and the next persist cycle.
  • O(N*M) auxiliary map cleanup in LSPS2 prune_terminal_channels: For each pruned channel, the auxiliary maps intercept_scid_by_channel_id and intercept_scid_by_user_channel_id are fully scanned via retain. Since we know the channel_id and user_channel_id for each pruned channel, direct remove calls would be O(1) per entry instead of O(N). Not a correctness issue, but could matter at scale.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 85.51913% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.83%. Comparing base (47122e8) to head (572648e).
⚠️ Report is 128 commits behind head on main.

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps1/service.rs 0.00% 39 Missing ⚠️
lightning-liquidity/src/lsps2/service.rs 93.02% 8 Missing and 4 partials ⚠️
lightning-liquidity/src/lsps1/peer_state.rs 99.14% 0 Missing and 1 partial ⚠️
lightning-liquidity/src/manager.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4526      +/-   ##
==========================================
+ Coverage   86.18%   86.83%   +0.64%     
==========================================
  Files         160      161       +1     
  Lines      108410   109617    +1207     
  Branches   108410   109617    +1207     
==========================================
+ Hits        93433    95185    +1752     
+ Misses      12344    11845     -499     
+ Partials     2633     2587      -46     
Flag Coverage Δ
fuzzing-fake-hashes 26.81% <0.00%> (?)
fuzzing-real-hashes 20.61% <0.00%> (?)
tests 86.21% <85.51%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 1st Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 2nd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 3rd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt TheBlueMatt removed the request for review from valentinewallace April 24, 2026 16:19
Copy link
Copy Markdown
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please respond (either fix or leave a reply why it doesn't matter/can't be fixed) all the AI review feedback and fix CI.

@f3r10
Copy link
Copy Markdown
Author

f3r10 commented Apr 29, 2026

Please respond (either fix or leave a reply why it doesn't matter/can't be fixed) all the AI review feedback and fix CI.

Yes, I am already fixing the issues reported by Claude

f3r10 added 3 commits May 4, 2026 14:33
Add `LSPS2ServiceHandler::prune_channels` that lets the LSP operator
remove all channels in the `PaymentForwarded` terminal state whose
`created_at` timestamp is at least `max_age` old.  Passing
 `Duration::ZERO` prunes all terminal channels regardless of age.  All associated state is cleaned up atomically:

- per-peer `intercept_scid_by_channel_id` and
`intercept_scid_by_user_channel_id`
 - handler-level `peer_by_intercept_scid` and `peer_by_channel_id`

A new `PeerState::prune_terminal_channels` helper handles the intra-peer
map cleanup and returns the removed `(scid, channel_id)` pairs for the handler to clean up the outer maps.

Integration tests cover: non-terminal channels not pruned, unknown
counterparty errors, age filtering, and successful bulk prune.
Add `LSPS5ServiceHandler::prune_webhook` that removes a single webhook
entry identified by `counterparty_node_id` and `LSPS5AppName`. The
method is synchronous, consistent with all other public `notify_*`
methods on this handler: it marks the peer state as dirty and relies on
the normal `LiquidityManager::persist` loop to flush the change to
the KVStore.

The method reuses the existing private `PeerState::remove_webhook`
helper and returns an `APIError::APIMisuseError` if the counterparty
has no registered state or the given `app_name` is not found.
…skew

Replace abs_diff with a saturating subtraction so that duration_since
returns Duration::ZERO when the reference timestamp is in the apparent
future (e.g. due to clock skew or monotonicity violations). Callers
using the result as an age — prune_terminal_orders (LSPS1),
prune_terminal_channels (LSPS2), and the stale-webhook and cooldown
checks (LSPS5) — no longer risk spuriously large values triggering
premature pruning or incorrect cooldown expiry.

Two unit tests are added to document the contract.
@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from e56052d to 572648e Compare May 4, 2026 19:37
Comment on lines +3126 to +3136
// Simulate what prune_channel does internally.
peer_state.outbound_channels_by_intercept_scid.remove(&intercept_scid);
peer_state.intercept_scid_by_channel_id.retain(|_, iscid| *iscid != intercept_scid);
peer_state.intercept_scid_by_user_channel_id.retain(|_, iscid| *iscid != intercept_scid);
peer_state.needs_persist = true;

// All maps must be empty after pruning.
assert!(peer_state.outbound_channels_by_intercept_scid.is_empty());
assert!(peer_state.intercept_scid_by_channel_id.is_empty());
assert!(peer_state.intercept_scid_by_user_channel_id.is_empty());
assert!(peer_state.needs_persist);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This test manually reimplements the pruning logic (remove + retain + set flag) instead of calling prune_terminal_channels. If the actual method has a bug — e.g., forgetting to clean up one of the auxiliary maps — this test will still pass because it tests a re-implementation, not the real method. Consider calling peer_state.prune_terminal_channels(&now, Duration::ZERO) and then asserting the maps are empty. The same applies to test_peer_state_prune_channel_non_terminal_rejected below which checks the guard condition manually rather than calling the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5

5 participants