Skip to content

auctioneer: fix initial per-account subscribe failure#520

Merged
Roasbeef merged 3 commits into
lightninglabs:masterfrom
djkazic:fix-stale-recovery-map-entries
May 22, 2026
Merged

auctioneer: fix initial per-account subscribe failure#520
Roasbeef merged 3 commits into
lightninglabs:masterfrom
djkazic:fix-stale-recovery-map-entries

Conversation

@djkazic
Copy link
Copy Markdown
Contributor

@djkazic djkazic commented May 21, 2026

Problem

auctioneer.Client.connectAndAuthenticate registers a new acctSubscription in c.subscribedAccts before running the 3-way auth handshake, and only removes it again via closeStream (called on full client shutdown or server-disconnect reconnect).

Every per-account error path returns without deleting the entry, including the legitimate ACCOUNT_DOES_NOT_EXIST response that RecoverAccounts triggers when probing keys that aren't known to the auctioneer yet.

The next call to StartAccountSubscription for that account key, typically from handleStateOpen after the on-chain funding tx confirms, hits the "already subscribed" early-return guard, returns nil without sending a fresh Commit, and the per-account handshake never runs. The client thinks the account is subscribed; the auctioneer has never authenticated it. The trader is silently filtered as offline trader at matching time and stays that way until the process is restarted (which clears the in-memory map).

Symptoms

The bug fires whenever RecoverAccounts probes a key the auctioneer hasn't seen yet. This happens naturally when:
- A user opens an account and runs recovery before the funding tx confirms.
- A user restores from seed and has accounts in flight.
- A user has had failed account creation attempts (the recovery loop walks past "holes" in the key sequence; every probed-but-not-found key seeds a stale entry).

Litd / lightning-terminal users disproportionately affected. The terminal UI invokes RecoverAccounts as part of its account-list refresh flow, so any restart of litd with a not-yet-confirmed account is enough to set the trap.

Fix

Add a success sentinel plus a defer in connectAndAuthenticate that removes the entry from c.subscribedAccts unless the handshake reached the GetSuccess branch. The defer is set up after the "already-subscribed" early-return guard, so it does not delete a pre-existing entry on that path.

srvMsg.GetSuccess() != nil is the only branch that sets success = true; every other exit (authenticate error, unrecognized response, recovery-mode INCOMPLETE_ACCOUNT_RESERVATION, recovery-mode ACCOUNT_DOES_NOT_EXIST, unknown error, unknown message) cleans up. The cleanup is keyed on acctPubKey, captured at the top of the function and never re-derived, so the deferred delete cannot remove an entry for a different account.

Tests

Adds TestConnectAndAuthenticateCleansUpOnAccountDoesNotExist in auctioneer/client_test.go. It drives a full connectAndAuthenticate call in recovery mode against a scripted fake stream: capture the Commit, echo its commitHash back in a Challenge, drain the Subscribe, then respond with SubscribeError_ACCOUNT_DOES_NOT_EXIST. Asserts connectAndAuthenticate returns (sub, false, nil) and c.subscribedAccts is empty afterwards.

Verified the test catches the bug: against the unmodified connectAndAuthenticate, it fails with subscribedAccts entry was not cleaned up after ACCOUNT_DOES_NOT_EXIST; later subscribes for the same account would silently no-op.

Roasbeef added 2 commits May 21, 2026 17:33
connectAndAuthenticate's deferred cleanup runs unconditionally on any
non-success exit. On the ErrServerErrored branch we hand off to
HandleServerShutdown, which itself clears subscribedAccts and re-runs
StartAccountSubscription for every previously-live account — installing
a fresh entry at acctPubKey. Without suppressing the defer, the outer
return would then turn around and delete that fresh entry, leaving the
just-resubscribed account unreachable from sendToSubscription.

Set success = true before invoking HandleServerShutdown so the inner
resubscribe's map entry survives. The defer's invariant — "only delete
what this call originally inserted" — still holds because the inner
StartAccountSubscription has already replaced our insertion with its own
before HandleServerShutdown returns.
The existing TestConnectAndAuthenticateCleansUpOnAccountDoesNotExist
only exercised one of the per-account error paths whose cleanup is now
guarded by the deferred delete. The INCOMPLETE_ACCOUNT_RESERVATION case
is the most error-prone of the bunch: it returns a non-nil sub *and* a
typed AcctResNotCompletedError, which makes the "sub got returned, so
the entry must still be in the map" mental model easy to fall into.

Refactor the test into a table-driven TestConnectAndAuthenticateCleansUpOnError
with a runCleanupCase helper that drives the full handshake and asserts
the post-condition, and add the INCOMPLETE_ACCOUNT_RESERVATION case
alongside the original ACCOUNT_DOES_NOT_EXIST case.
Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥬

@Roasbeef Roasbeef merged commit ec13f27 into lightninglabs:master May 22, 2026
6 checks passed
@djkazic djkazic deleted the fix-stale-recovery-map-entries branch May 22, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants