auctioneer: resubscribe account after stream loss#521
Merged
Roasbeef merged 1 commit intoMay 22, 2026
Merged
Conversation
44ba2ee to
5b7fe11
Compare
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
HandleServerShutdownrebuilds every account's subscription after the auctioneer stream goes down. Today the loop bails on the first per-account error, leaving every remaining account silently un-subscribed until the process restarts. Switch the loop to best-effort: log per-account failures, keep going, and aggregate errors so the caller still gets a signal.auctioneer/client.go:1304-1309(current code):acctKeysis drained fromc.subscribedAcctsimmediately before the loop, so every entry that doesn't get re-subscribed is gone from the in-memory map as well. IfStartAccountSubscriptionfails for one account, for example, the server returns any error response in non-recovery mode (client.go:694-697), or the new stream's per-account handshake errors, the function returns and every account after it in iteration order is silently dropped.From the trader's perspective the accounts look healthy in the CLI and account manager state; from the server's perspective they're not subscribed, and their orders are filtered as "offline trader" at every matching attempt until the trader process restarts.
This is the same severity class as the EOF-no-reconnect bug (#518) and the recovery-leaves-stale-entry bug (#520): a single transient or account-specific failure silently degrades the trader's participation in auctions, with no in-band signal to the user.
The bug is also conditional on Go's randomized map iteration order — the "survivors" are whichever accounts happen to be drained before the failing one. That ordering can change between runs, which makes the symptom intermittent and hard to attribute.
Fix
auctioneer/client.go:HandleServerShutdown: collect per-account errors instead of returning on the first one. Log each failure (with the trader key) so the operator has actionable per-account context in the trader log, then returnerrors.Joinof every collected error. The caller inpool/rpcserver.go:315 still receives — and logs — the aggregated error, so callers that previously inspected the return value continue to see a non-nil error when anything went wrong; they just no longer lose visibility into the survivors.Tests
Adds TestHandleServerShutdownPartialResubscribeFailure in
auctioneer/client_test.go. It pre-populatessubscribedAcctswith three accounts, drivesHandleServerShutdownend-to-end against a scripted fake stream, and asserts every account's handshake is attempted even when one of them returnsACCOUNT_DOES_NOT_EXIST.The orchestrator fails the first observed handshake rather than a specific account key. This keeps the assertion deterministic under Go's randomized map iteration order: with the buggy loop, exactly one handshake ever completes (the failing one); with the fix, all three do.
Verified the test catches the bug: against the unmodified
HandleServerShutdown, it fails with expected re-subscribe to attempt all 3 accounts, only attempted 1 — the loop bails on first error and leaves later accounts silently un-subscribed.