Skip to content

plugins: defer spenderp awaiting-channel recovery#9272

Draft
bittylicious wants to merge 1 commit into
ElementsProject:masterfrom
bittylicious:opencode/spenderp-defer-awaiting-recovery
Draft

plugins: defer spenderp awaiting-channel recovery#9272
bittylicious wants to merge 1 commit into
ElementsProject:masterfrom
bittylicious:opencode/spenderp-defer-awaiting-recovery

Conversation

@bittylicious

Copy link
Copy Markdown

Summary

spenderp currently scans CHANNELD_AWAITING_LOCKIN / DUALOPEND_AWAITING_LOCKIN channels during plugin init and queues signpsbt recovery attempts for saved funding PSBTs.

On a large mainnet node, one such signpsbt took about 130 seconds before failing with:

Failed signpsbt for waiting channel ...: {"code":-32602,"message":"Could not add keypaths to PSBT?"}

While that recovery work was in the plugin init path, unrelated important builtin plugins were still waiting to finish startup and hit their init timeout. The plugin named in the fatal log varied depending on which builtins were enabled (autoclean, chanbackup, commando, funder, cln-bwatch, topology), causing lightningd to shut down/restart.

This PR defers the awaiting-channel PSBT recovery scan to a zero-delay plugin timer. The recovery behaviour is preserved, but it no longer blocks spenderp init or the global plugin startup handshake.

Reproduction context

Observed on CLN v26.06 and v26.06.2 with a SQLite wallet DB that was initially around 9.1 GiB, later vacuumed to around 4.7 GiB. Vacuum/integrity checks and upgrade to v26.06.2 did not change the symptom.

The triggering channel was in CHANNELD_AWAITING_LOCKIN, with:

  • funding.withheld=false
  • a top-level funding_txid
  • a saved funding.psbt

Disabling spenderp stopped the startup crash loop. Re-enabling only the funding-related builtin group reproduced the slow signpsbt and later spenderp failure log.

Notes

This does not attempt to decide whether that saved PSBT should be retried at all. It only prevents slow recovery signing from being startup-fatal.

A follow-up may still be useful to narrow the retry criteria for saved funding PSBTs that already correspond to a broadcast funding transaction.

Testing

Not yet compiled locally; local checkout lacked build dependencies (lowdown, and likely other CLN build deps). Marking this PR as draft while we validate the approach and add/update tests.

@bittylicious bittylicious force-pushed the opencode/spenderp-defer-awaiting-recovery branch 2 times, most recently from 1f6ab60 to dcdc4b7 Compare July 2, 2026 18:13
@bittylicious bittylicious force-pushed the opencode/spenderp-defer-awaiting-recovery branch from dcdc4b7 to d7b9ce6 Compare July 2, 2026 18:31
@bittylicious

Copy link
Copy Markdown
Author

Live validation on the reproducing node:

  • Built the PR branch from commit d7b9ce61 with docker build --no-cache using the existing CLN Dockerfile.
  • Build completed successfully, including recompilation/linking of plugins/spender/openchannel.c and plugins/spenderp.
  • Published a production-overlay test image preserving the existing runtime/plugin environment: bittylicious/clightning:spenderp-defer-awaiting-recovery@sha256:000a8baf90f0c22ef1bb2bfe33dfea6e58918f771b97a4fbd609ed48aaef12e9.
  • Deployed that image on the affected mainnet node with spenderp enabled again.
  • Startup reached Server started cleanly with the normal plugin set; no disable-plugin=spenderp was present.
  • After the 180s delay, spenderp retried the awaiting-lockin channel recovery and still hit the original wallet/PSBT problem:
plugin-spenderp: Failed signpsbt for waiting channel ...: {"code":-32602,"message":"Could not add keypaths to PSBT?"}
  • Crucially, this no longer happened during plugin init, and the daemon did not exit or restart. After the retry/failure, docker inspect still showed RestartCount=0, RPC was responsive, and getinfo returned normally.

So this does not fix the underlying signpsbt/keypath failure, but it does validate the intended behavior of this PR on the live reproducer: slow/failing awaiting-channel recovery no longer blocks other important plugins from completing init or causes lightningd to crash-loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant