sweep: fix utxo contention and spurious fee ratchet#10816
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the reliability of the UTXO sweeper by addressing two critical issues: race conditions during UTXO selection and incorrect fee rate escalation. By implementing a leasing system for selected UTXOs, the sweeper now ensures that inputs are reserved during the coin selection process. Additionally, the logic for handling sweep failures has been refined to distinguish between fee-related errors and resource-related errors, ensuring that fee rates are only ratcheted when appropriate. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to lease wallet UTXOs selected as fee inputs during the sweeping process, preventing collisions between concurrent sweep attempts. It also refines the fee-bumping logic to avoid ratcheting up fee rates for non-fee-related failures, such as insufficient budget or missing wallet inputs. A suggestion was made to ensure that error reporting in handleReplacementTxError consistently reflects the underlying cause when a fee calculation fails.
| if ferr != nil { | ||
| // If there's an error with the fee calculation, we need to | ||
| // abort the sweep. | ||
| event = TxFatal | ||
| } |
There was a problem hiding this comment.
For consistency with handleInitialTxError (lines 1124-1128), when calculateRetryFeeRate returns an error, the err variable should be updated to the calculation error. This ensures that the BumpResult and the subsequent error log (line 1909) reflect the reason why the sweep was aborted with a TxFatal event.
| if ferr != nil { | |
| // If there's an error with the fee calculation, we need to | |
| // abort the sweep. | |
| event = TxFatal | |
| } | |
| if ferr != nil { | |
| // If there's an error with the fee calculation, we need to | |
| // abort the sweep. | |
| event = TxFatal | |
| err = ferr | |
| } |
🔴 PR Severity: CRITICAL
🔴 Critical (5 files)
⬜ Test/Excluded (4 files — not counted)
AnalysisAll changed files reside in the The PR modifies 5 non-test production files with 144 net lines changed, which is under the bump thresholds (>20 files or >500 lines). No severity bump was applied. The changes appear to extend fee-bumping behavior and the tx input set logic, warranting careful review by an engineer familiar with the sweep package and on-chain fee estimation. To override, add a |
568f4ce to
36a7dab
Compare
When the sweeper processes multiple pending inputs in a single coin-select-lock cycle, each one independently asks the wallet for a UTXO to cover fees. Nothing in that loop communicates "this UTXO is already claimed", so two concurrent sweeps can pick the same one; when the winner publishes and burns it, the loser is left referencing a spent input and can loop on retries without finding a free UTXO if the wallet view is slow to reflect the spend. Fix: lease each chosen wallet UTXO on the wallet under a sweeper- specific lock ID at selection time. Subsequent selections in the same cycle and across the retry window before the sweep tx confirms see a strictly smaller pool. The lease expires after the default duration if the sweep is abandoned, and is naturally consumed when the sweep confirms.
When a sweep fails to publish, the sweeper computes a higher retry fee rate and stores it as the input's new starting rate. That's the right move for fee-related failures, where the broadcast was rejected for not paying enough. But the same machinery also fires on resource failures: no wallet UTXO available, the input itself can't cover the desired fee, and so on. Those failures carry no fee-rate signal, and repeatedly ratcheting the rate on them can push it past the input's intrinsic budget, after which the aggregator silently skips the input and it's stranded. The bug was previously masked by a separate UTXO-collision quirk that produced UnknownSpend rather than a resource error on the loser of a collision; with that fixed (in the preceding commit), this ratchet bug surfaces. Fix: distinguish fee-rate-bearing failures from resource ones at the point where the bump result is produced, and skip the rate ratchet for the latter. The input's existing starting fee rate is preserved across non-fee failures, so the next retry attempts at the same rate the original sweep request specified.
Makes two fixes to the UTXO sweeper. The first eliminates contention for UTXOs by leasing them during selection; the second ensures that we don't ratchet fee rates on sweep failure, if fee rates aren't the cause of the failure.
Interestingly, the first fix "unlocks" the need for the second one.
Consider two sweeps. Previously, the loser of a UTXO collision (in which both sweeps try to use the same UTXO) saw UnknownSpend, which would not trigger the fee ratchet. The first fix eliminates the concurrent UTXO contention; now, if there are no other UTXOs available for the second sweep after the first has leased one, the sweeper will fail with ErrNotEnoughInputs, which triggers the fee rate ratchet, even though fees aren't actually the issue (the issue is that there aren't any UTXOs to cover the fees at all).
Ratcheting the fee rate on a sweep for which fees are not the problem will not solve the problem, so the second fix makes sure we don't spuriously ratchet the fee rate to the point that the sweep becomes stranded.
Partially resolves lightninglabs/taproot-assets#1976. Resolves #7397.