Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526
Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526f3r10 wants to merge 5 commits intolightningdevkit:mainfrom
Conversation
|
👋 I see @valentinewallace was un-assigned. |
| /// | ||
| /// Returns an [`APIError::APIMisuseError`] if the counterparty has no state, the order is | ||
| /// unknown, or the order is in a non-terminal state. | ||
| pub async fn prune_order( |
There was a problem hiding this comment.
Hmm, is the right API to require downstream code list the specific orders to prune or should we have some kind of "prune all failed orders older than X" API? @tnull wdyt?
There was a problem hiding this comment.
You are right @TheBlueMatt, it would be much better something like this:
handler.prune_orders(client_id, Duration::from_secs(30 * 24 * 3600)).await?;
I am going to update that part. Thanks for the early review
There was a problem hiding this comment.
Yeah, I agree, allowing to prune older entries in an interval is great. At some point we still might want to expose the LSPS{1,2} service state via the API, at which point allowing to drop specific orders would make sense, but not sure if that's not better done in a dedicated follow-up.
There was a problem hiding this comment.
I think it would be better in a follow-up when service state listing exists
There was a problem hiding this comment.
Ready the new method using an interval @tnull @TheBlueMatt
Add `prune_orders(counterparty_node_id, max_age: Duration)` to both `LSPS1ServiceHandler` and `LSPS1ServiceHandlerSync`. It removes all terminal orders (`CompletedAndChannelOpened` / `FailedAndRefunded`) for a given peer that are at least `max_age` old, persists the updated state, and returns the number of entries removed. Passing `Duration::ZERO` prunes all terminal orders immediately regardless of age, which is the recommended approach to unblock a client that has hit the per-peer request limit due to accumulated failed orders. On the `PeerState` layer, `prune_terminal_orders(now, max_age)` uses `retain` for a single-pass removal and sets `needs_persist` only when at least one entry is removed.
25a876b to
ebaa7ed
Compare
|
Trying to add the API for LSPS2, using the same strategy as LSPS1 (using intervals), I realize it is not quite possible because The solution would be to add My question is, should this change be done in this same PR or in a follow-up @tnull @TheBlueMatt wdyt? |
|
Yea, makes sense to add a created-at timestamp in LSPS2 requests imo. |
Add a `created_at: LSPSDateTime` field to `OutboundJITChannel` to record when each JIT channel was created (i.e., when the buy request was accepted by the LSP). This timestamp is needed to implement time-based bulk pruning of completed channel state. The field is persisted as TLV type 10 with a `default_value` of Unix epoch, ensuring old serialized data (without TLV 10) is read back successfully with the epoch sentinel rather than failing deserialization.
| (4, opening_fee_params, required), | ||
| (6, payment_size_msat, option), | ||
| (8, trust_model, required), | ||
| (10, created_at, (default_value, LSPSDateTime::new_from_duration_since_epoch(Duration::ZERO))), |
There was a problem hiding this comment.
Bug: The default_value of epoch (Duration::ZERO = 1970-01-01T00:00:00Z) means that any pre-existing OutboundJITChannel deserialized from old persisted state (before this field existed) will have created_at = epoch. When prune_channels is called, now.duration_since(epoch) will return ~54 years, so any max_age shorter than that will prune all pre-upgrade terminal channels. This silently breaks the age-filter semantics for operators who upgrade and then call prune_channels expecting only old-enough channels to be pruned.
Consider using a sentinel value (e.g., Option<LSPSDateTime>) and skipping the age check for channels without a created_at, or document this prominently so operators know the first post-upgrade prune with max_age > 0 will still prune all legacy terminal channels.
There was a problem hiding this comment.
I agree with the suggestion. I am going to convert that field to Option<LSPSDateTime>
In this context, None would mean "unknown age" (pre-upgrade data). In prune_terminal_channels:
- Duration::ZERO → prunes all terminal channels including legacy ones
- Any positive max_age → skips channels with created_at = None, only prunes those with a known created_at old enough
| let should_prune = max_age == Duration::ZERO | ||
| || now.duration_since(&channel.created_at) >= max_age; |
There was a problem hiding this comment.
Bug (pre-existing, but used by new code): LSPSDateTime::duration_since uses abs_diff, so it returns the absolute time difference. If now < created_at (e.g., due to clock skew or time adjustments), this returns a positive duration equal to the gap, which could exceed max_age and cause a channel to be incorrectly pruned even though it was just created.
For age-based pruning, a saturating subtraction (return Duration::ZERO when now < created_at) would be correct — an entry created "in the future" relative to now should never be considered old enough to prune. Same issue applies to the LSPS1 usage in prune_terminal_orders.
| let mut outer_state_lock = self.per_peer_state.write().unwrap(); | ||
| match outer_state_lock.get_mut(&counterparty_node_id) { | ||
| Some(peer_state) => { | ||
| if !peer_state.remove_webhook(app_name) { |
There was a problem hiding this comment.
Nit: remove_webhook unconditionally sets needs_persist |= true (line 822 of peer state) even when no webhook is found. When prune_webhook is called with a non-existent app_name, remove_webhook returns false and this method correctly returns an error — but the needs_persist flag has already been set, causing a needless persistence cycle. Consider moving the needs_persist assignment inside the retain's removal branch in remove_webhook.
Review SummaryAll three issues from my prior review have been addressed:
New inline comment posted
Cross-cutting observations
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4526 +/- ##
==========================================
+ Coverage 86.18% 86.83% +0.64%
==========================================
Files 160 161 +1
Lines 108410 109617 +1207
Branches 108410 109617 +1207
==========================================
+ Hits 93433 95185 +1752
+ Misses 12344 11845 -499
+ Partials 2633 2587 -46
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Please respond (either fix or leave a reply why it doesn't matter/can't be fixed) all the AI review feedback and fix CI.
Yes, I am already fixing the issues reported by Claude |
Add `LSPS2ServiceHandler::prune_channels` that lets the LSP operator remove all channels in the `PaymentForwarded` terminal state whose `created_at` timestamp is at least `max_age` old. Passing `Duration::ZERO` prunes all terminal channels regardless of age. All associated state is cleaned up atomically: - per-peer `intercept_scid_by_channel_id` and `intercept_scid_by_user_channel_id` - handler-level `peer_by_intercept_scid` and `peer_by_channel_id` A new `PeerState::prune_terminal_channels` helper handles the intra-peer map cleanup and returns the removed `(scid, channel_id)` pairs for the handler to clean up the outer maps. Integration tests cover: non-terminal channels not pruned, unknown counterparty errors, age filtering, and successful bulk prune.
Add `LSPS5ServiceHandler::prune_webhook` that removes a single webhook entry identified by `counterparty_node_id` and `LSPS5AppName`. The method is synchronous, consistent with all other public `notify_*` methods on this handler: it marks the peer state as dirty and relies on the normal `LiquidityManager::persist` loop to flush the change to the KVStore. The method reuses the existing private `PeerState::remove_webhook` helper and returns an `APIError::APIMisuseError` if the counterparty has no registered state or the given `app_name` is not found.
…skew Replace abs_diff with a saturating subtraction so that duration_since returns Duration::ZERO when the reference timestamp is in the apparent future (e.g. due to clock skew or monotonicity violations). Callers using the result as an age — prune_terminal_orders (LSPS1), prune_terminal_channels (LSPS2), and the stale-webhook and cooldown checks (LSPS5) — no longer risk spuriously large values triggering premature pruning or incorrect cooldown expiry. Two unit tests are added to document the contract.
e56052d to
572648e
Compare
| // Simulate what prune_channel does internally. | ||
| peer_state.outbound_channels_by_intercept_scid.remove(&intercept_scid); | ||
| peer_state.intercept_scid_by_channel_id.retain(|_, iscid| *iscid != intercept_scid); | ||
| peer_state.intercept_scid_by_user_channel_id.retain(|_, iscid| *iscid != intercept_scid); | ||
| peer_state.needs_persist = true; | ||
|
|
||
| // All maps must be empty after pruning. | ||
| assert!(peer_state.outbound_channels_by_intercept_scid.is_empty()); | ||
| assert!(peer_state.intercept_scid_by_channel_id.is_empty()); | ||
| assert!(peer_state.intercept_scid_by_user_channel_id.is_empty()); | ||
| assert!(peer_state.needs_persist); |
There was a problem hiding this comment.
Nit: This test manually reimplements the pruning logic (remove + retain + set flag) instead of calling prune_terminal_channels. If the actual method has a bug — e.g., forgetting to clean up one of the auxiliary maps — this test will still pass because it tests a re-implementation, not the real method. Consider calling peer_state.prune_terminal_channels(&now, Duration::ZERO) and then asserting the maps are empty. The same applies to test_peer_state_prune_channel_non_terminal_rejected below which checks the guard condition manually rather than calling the method.
Closes #4444.
Add explicit pruning APIs to
LSPS1ServiceHandler,LSPS2ServiceHandler, andLSPS5ServiceHandlerso LSP operators can reclaim memory from completed historical state.