fix: bound pending recovered sig queue to prevent remote OOM by PastaPastaPasta · Pull Request #7402 · dashpay/dash

PastaPastaPasta · 2026-07-02T15:37:51Z

Issue

Incoming QSIGREC (recovered signature) P2P messages are deserialized and appended to the per-node queue CSigningManager::pendingRecoveredSigs after only cheap gating checks (quorum exists/active, dedup by full hash). Actual verification is expensive (BLS) and is deferred to a single worker thread that drains only ~32 unique sign-hash sessions per cycle, whereas ingestion is crypto-free and happens on the network threads in parallel across connections.

Two problems compound:

No size bound. pendingRecoveredSigs had no per-node or global cap, and the dedup check keys on the full message hash (which includes the signature), so it does not constrain a peer sending many distinct messages. Ingestion structurally outruns the drain.
No cleanup on ban/disconnect. RemoveBannedNodeStates only cleans the separate sig-shares subsystem (CSigSharesManager::nodeStates); nothing purged pendingRecoveredSigs for banned or disconnected peers.

Net effect: a peer (or several) can grow the queue faster than it drains, increasing RSS until the node is OOM-killed — a remote, unauthenticated memory-exhaustion DoS. Highest impact against masternodes, which accept many inbound connections and run the LLMQ signing behind ChainLocks / InstantSend.

Fix

Bound the queue in VerifyAndProcessRecoveredSig with a per-node cap (MAX_PENDING_RECSIGS_PER_NODE = 1000) and a global cap (MAX_PENDING_RECSIGS_TOTAL = 10000). Over-cap messages are dropped silently, without a misbehaviour score — verification is single-threaded, so an honest peer can legitimately outrun the drain during a burst, and this is queue-depth backpressure rather than a protocol violation (contrast the per-message structural caps for sig-shares, which do ban).
Purge banned peers' queues via a new CSigningManager::RemoveNodesIf(predicate) (mirroring CSigSharesManager::RemoveNodesIf), invoked from WorkThreadSigning's periodic cleanup using PeerIsBanned.
Prune drained (empty) node entries during collection, so the global-cap scan stays cheap and disconnected nodes' queues are reclaimed once drained without waiting for a ban.

With these caps the attacker-controllable pending queue is bounded to roughly 8 MB worst case (10,000 entries × ~750 bytes resident per entry; the in-memory CRecoveredSig is 696 bytes vs. 193 on the wire, dominated by CBLSLazySignature). Previously it was unbounded.

Testing

make check targets build clean; full dashd builds and links.
feature_llmq_signing.py passes — normal recovered-sig relay across masternodes is unaffected by the caps.
lint-whitespace, lint-includes, lint-circular-dependencies clean.

🤖 Generated with Claude Code

github-actions · 2026-07-02T15:38:28Z

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

thepastaclaw · 2026-07-02T15:39:05Z

✅ Review complete (commit 95e160d)

coderabbitai · 2026-07-02T15:44:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3f344d30-0858-4a12-86f9-c57f98797ac7

📥 Commits

Reviewing files that changed from the base of the PR and between b75b568 and 95e160d.

📒 Files selected for processing (3)

src/llmq/net_signing.cpp
src/llmq/signing.cpp
src/llmq/signing.h

🚧 Files skipped from review as they are similar to previous changes (3)

src/llmq/net_signing.cpp
src/llmq/signing.h
src/llmq/signing.cpp

Walkthrough

This PR adds backpressure controls to the pending recovered-signature queue in the signing manager. Two new caps, a global total and a per-node limit, are enforced when enqueuing recovered sigs, dropping over-limit entries with a log message. A new RemoveNodesIf method removes queued entries for nodes matching a predicate, and empty per-node queue entries are pruned during pending-sig collection. The worker thread's periodic cleanup now also removes pending recovered sigs belonging to banned peers.

Estimated code review effort: 2 (Simple) | ~15 minutes

Sequence Diagram(s)

sequenceDiagram
  participant WorkThreadSigning
  participant PeerManager
  participant CSigningManager
  participant pendingRecoveredSigs

  loop periodic cleanup
    WorkThreadSigning->>CSigningManager: Cleanup()
    WorkThreadSigning->>PeerManager: PeerIsBanned(node_id)?
    WorkThreadSigning->>CSigningManager: RemoveNodesIf(predicate)
    CSigningManager->>pendingRecoveredSigs: erase entries for banned nodes
  end

  Note over CSigningManager: VerifyAndProcessRecoveredSig
  CSigningManager->>pendingRecoveredSigs: compute total & per-node counts
  alt caps exceeded
    CSigningManager-->>CSigningManager: log and drop recovered sig
  else within caps
    CSigningManager->>pendingRecoveredSigs: enqueue recovered sig
  end

Suggested reviewers: UdjinM6

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: bounding the pending recovered-sig queue to prevent remote OOM.
Description check	✅ Passed	The description is directly related to the queue-bounding and cleanup changes in the patch.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/llmq/signing.cpp (2)

462-472: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Same std::erase_if opportunity here.

Mirrors the loop in RemoveNodesIf; can be simplified the same way.

♻️ Proposed refactor

-        for (auto it = pendingRecoveredSigs.begin(); it != pendingRecoveredSigs.end();) {
-            if (it->second.empty()) {
-                it = pendingRecoveredSigs.erase(it);
-            } else {
-                ++it;
-            }
-        }
+        std::erase_if(pendingRecoveredSigs, [](const auto& entry) { return entry.second.empty(); });

As per coding guidelines, "Dash Core implementation must be written in C++20, requiring at least Clang 16 or GCC 11.1" (`src/**/*.{cpp,h,hpp,cc}`).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/signing.cpp` around lines 462 - 472, The manual erase loop over
pendingRecoveredSigs should be refactored to use std::erase_if, matching the
simpler pattern used in RemoveNodesIf. Update the cleanup logic in signing.cpp
so the pruning of empty entries is expressed with std::erase_if on
pendingRecoveredSigs, keeping the same behavior while reducing boilerplate and
aligning with the C++20 style guidelines.

Source: Coding guidelines

419-429: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use std::erase_if for this cleanup. Replace the manual iterator loop with std::erase_if(pendingRecoveredSigs, ...) to keep the code shorter and less error-prone.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/signing.cpp` around lines 419 - 429, Replace the manual iterator
cleanup in CSigningManager::RemoveNodesIf with std::erase_if on
pendingRecoveredSigs, using the existing predicate against the map key. Keep the
cs_pending lock in place and preserve the same removal behavior while
simplifying the implementation and avoiding the explicit erase/advance loop.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 80-89: In CDKGPendingMessages, the duplicate check in the message
handling flow should happen before consuming the per-node quota. Update the
logic around messagesPerNode[from] and seenMessages.emplace(hash) so duplicate
hashes are dropped first, and only increment messagesPerNode[from] after a
message is confirmed new and will be queued. Keep the existing behavior in the
same method, but reorder the checks to avoid counting retransmits against the
peer.

In `@src/llmq/net_dkg.cpp`:
- Around line 108-113: The deserialization path in the DKG message handling
currently accepts payloads even when unread trailing bytes remain after vRecv >>
*msg, which can cause duplicate suppression to miss semantically identical
messages. Update the receive/parse logic in the DKG message handler that calls
vRecv >> *msg and then ValidateDKGMessageStructure so it verifies the stream was
fully consumed after deserialization, and reject the message by returning
nullptr if any bytes remain or parsing leaves extra data.

---

Nitpick comments:
In `@src/llmq/signing.cpp`:
- Around line 462-472: The manual erase loop over pendingRecoveredSigs should be
refactored to use std::erase_if, matching the simpler pattern used in
RemoveNodesIf. Update the cleanup logic in signing.cpp so the pruning of empty
entries is expressed with std::erase_if on pendingRecoveredSigs, keeping the
same behavior while reducing boilerplate and aligning with the C++20 style
guidelines.
- Around line 419-429: Replace the manual iterator cleanup in
CSigningManager::RemoveNodesIf with std::erase_if on pendingRecoveredSigs, using
the existing predicate against the map key. Keep the cs_pending lock in place
and preserve the same removal behavior while simplifying the implementation and
avoiding the explicit erase/advance loop.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 191c2890-110d-4b9b-bee7-8d22efff1e0f

📥 Commits

Reviewing files that changed from the base of the PR and between 59d36a9 and b75b568.

📒 Files selected for processing (6)

src/llmq/dkgsessionhandler.cpp
src/llmq/dkgsessionhandler.h
src/llmq/net_dkg.cpp
src/llmq/net_signing.cpp
src/llmq/signing.cpp
src/llmq/signing.h

💤 Files with no reviewable changes (1)

src/llmq/dkgsessionhandler.cpp

coderabbitai · 2026-07-02T15:45:00Z

+        if (messagesPerNode[from] >= maxMessagesPerNode) {
+            // TODO ban?
+            LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
+            return;
        }
+        messagesPerNode[from]++;

-        std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret;
-        ret.reserve(binaryMessages.size());
-        for (const auto& bm : binaryMessages) {
-            auto msg = std::make_shared<Message>();
-            try {
-                *bm.second >> *msg;
-            } catch (...) {
-                msg = nullptr;
-            }
-            ret.emplace_back(std::make_pair(bm.first, std::move(msg)));
+        if (!seenMessages.emplace(hash).second) {
+            LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);
+            return;


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Check duplicates before consuming the per-node quota.

Line 85 increments messagesPerNode[from] before Line 87 drops duplicate hashes, so duplicate retransmits permanently consume that peer’s quota until Clear() even though nothing was queued.

Proposed fix

- if (messagesPerNode[from] >= maxMessagesPerNode) { - // TODO ban? - LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from); - return; - } - messagesPerNode[from]++; - if (!seenMessages.emplace(hash).second) { LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from); return; } + + if (messagesPerNode[from] >= maxMessagesPerNode) { + // TODO ban? + seenMessages.erase(hash); + LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from); + return; + } + messagesPerNode[from]++;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (messagesPerNode[from] >= maxMessagesPerNode) {

// TODO ban?

LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);

return;

}

messagesPerNode[from]++;

std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret;

ret.reserve(binaryMessages.size());

for (const auto& bm : binaryMessages) {

auto msg = std::make_shared<Message>();

try {

*bm.second >> *msg;

} catch (...) {

msg = nullptr;

}

ret.emplace_back(std::make_pair(bm.first, std::move(msg)));

if (!seenMessages.emplace(hash).second) {

LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);

return;

if (!seenMessages.emplace(hash).second) {

LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);

return;

}

if (messagesPerNode[from] >= maxMessagesPerNode) {

// TODO ban?

seenMessages.erase(hash);

LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);

return;

}

messagesPerNode[from]++;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/dkgsessionhandler.h` around lines 80 - 89, In CDKGPendingMessages, the duplicate check in the message handling flow should happen before consuming the per-node quota. Update the logic around messagesPerNode[from] and seenMessages.emplace(hash) so duplicate hashes are dropped first, and only increment messagesPerNode[from] after a message is confirmed new and will be queued. Keep the existing behavior in the same method, but reorder the checks to avoid counting retransmits against the peer.

coderabbitai · 2026-07-02T15:45:01Z

    try {
-        CDataStream s(vRecv); // copy; deserialization does not advance the caller's stream
-        if (msg_type == NetMsgType::QCONTRIB) {
-            CDKGContribution qc;
-            s >> qc;
-            return qc.vvec != nullptr && qc.vvec->size() == threshold &&
-                   qc.contributions != nullptr && qc.contributions->blobs.size() <= size;
-        } else if (msg_type == NetMsgType::QCOMPLAINT) {
-            CDKGComplaint qc;
-            s >> qc;
-            return qc.badMembers.size() == qc.complainForMembers.size() &&
-                   qc.badMembers.size() <= size;
-        } else if (msg_type == NetMsgType::QJUSTIFICATION) {
-            CDKGJustification qj;
-            s >> qj;
-            return qj.contributions.size() <= size;
-        } else if (msg_type == NetMsgType::QPCOMMITMENT) {
-            CDKGPrematureCommitment qc;
-            s >> qc;
-            return qc.validMembers.size() <= size;
-        }
-        return false;
+        vRecv >> *msg;
    } catch (const std::exception&) {
-        return false;
+        return nullptr;
    }
+    if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Reject trailing bytes after deserialization.

vRecv >> *msg can leave unread bytes that are still included in the wire hash. Accepting those payloads lets semantically identical messages bypass duplicate suppression with different trailing garbage.

Proposed fix

try { vRecv >> *msg; } catch (const std::exception&) { return nullptr; } + if (!vRecv.empty()) return nullptr; if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try {

CDataStream s(vRecv); // copy; deserialization does not advance the caller's stream

if (msg_type == NetMsgType::QCONTRIB) {

CDKGContribution qc;

s >> qc;

return qc.vvec != nullptr && qc.vvec->size() == threshold &&

qc.contributions != nullptr && qc.contributions->blobs.size() <= size;

} else if (msg_type == NetMsgType::QCOMPLAINT) {

CDKGComplaint qc;

s >> qc;

return qc.badMembers.size() == qc.complainForMembers.size() &&

qc.badMembers.size() <= size;

} else if (msg_type == NetMsgType::QJUSTIFICATION) {

CDKGJustification qj;

s >> qj;

return qj.contributions.size() <= size;

} else if (msg_type == NetMsgType::QPCOMMITMENT) {

CDKGPrematureCommitment qc;

s >> qc;

return qc.validMembers.size() <= size;

}

return false;

vRecv >> *msg;

} catch (const std::exception&) {

return false;

return nullptr;

}

if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;

try {

vRecv >> *msg;

} catch (const std::exception&) {

return nullptr;

}

if (!vRecv.empty()) return nullptr;

if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 108 - 113, The deserialization path in the DKG message handling currently accepts payloads even when unread trailing bytes remain after vRecv >> *msg, which can cause duplicate suppression to miss semantically identical messages. Update the receive/parse logic in the DKG message handler that calls vRecv >> *msg and then ValidateDKGMessageStructure so it verifies the stream was fully consumed after deserialization, and reject the message by returning nullptr if any bytes remain or parsing leaves extra data.

QSIGREC messages were deserialized and appended to CSigningManager::pendingRecoveredSigs after only cheap gating checks (quorum exists/active, dedup by hash). Verification is deferred to a single worker thread draining ~32 unique sign-hash sessions per cycle, while ingestion is unbounded and crypto-free on the network threads. The queue had no size cap and was never purged for banned/disconnected peers (RemoveBannedNodeStates only cleans the separate sig-shares subsystem), so a peer could enqueue faster than the queue drains and grow memory until the node is OOM-killed — a remote, unauthenticated DoS, worst against masternodes. Bound the queue with per-node (1000) and global (10000) caps, dropping over-cap messages silently (an honest peer can legitimately outrun the single-threaded drain during a burst, so no misbehaviour is scored). Add CSigningManager::RemoveNodesIf to purge banned peers' queues, invoked from WorkThreadSigning's periodic cleanup. Prune drained (empty) node entries during collection so the global-cap scan stays cheap and disconnected nodes' queues are reclaimed once drained. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95e160d20f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T15:51:19Z

+            // Drop pending recovered sigs queued by banned peers so a flood's backlog does not
+            // persist after the peer is banned (RemoveBannedNodeStates only cleans the sig-shares
+            // subsystem, not m_sig_manager's pending recovered sigs).
+            m_sig_manager.RemoveNodesIf([this](NodeId node_id) { return m_peer_manager->PeerIsBanned(node_id); });


Purge disconnected peer queues before enforcing the cap

When a flooding peer disconnects before this 5s cleanup runs, this predicate never matches because PeerIsBanned delegates to PeerManagerImpl::IsBanned, which returns false once GetPeerRef(node_id) is null after FinalizeNode removes the peer. Those disconnected peers' pendingRecoveredSigs entries still count toward MAX_PENDING_RECSIGS_TOTAL, so an attacker can fill the new global cap with stale node IDs and cause subsequent honest QSIGRECs to be dropped until the expensive verifier drains them.

Useful? React with 👍 / 👎.

thepastaclaw

Code Review

Two well-scoped LLMQ networking changes: a DoS fix bounding CSigningManager::pendingRecoveredSigs with per-node/global caps plus pruning of drained entries, and a refactor templating CDKGPendingMessages on message type so DKG payloads deserialize exactly once on the p2p thread while preserving inventory-hash bytes and misbehaviour semantics. Both changes look correct. One convergent perf suggestion carried forward.

Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history, failed), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents

These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `src/llmq/signing.cpp`:
- [SUGGESTION] src/llmq/signing.cpp:400-408: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC
  VerifyAndProcessRecoveredSig sums list.size() over every entry in pendingRecoveredSigs under cs_pending on each admission (lines 400-403). The pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471) keeps the map bounded to nodes with in-flight sigs, so this is O(#active peers with pending) rather than O(everyone-ever) — small in normal operation. However, this path runs hottest exactly during the burst/flood scenario the PR is defending against, when many peers can be near their per-node cap and every new message forces another scan under the shared lock also needed by the drain thread. A running counter incremented in the emplace_back on line 416 and decremented at the erase points (drain in CollectPendingRecoveredSigsToVerify line 458, prune loop lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Not a correctness issue — take it or leave it.

Inline submission via scripts/review_poster.py hit GitHub HTTP 422, so I posted the same verified review as a body-only COMMENT.

thepastaclaw

Code Review

Single carried-forward suggestion from the prior review: the global pending-queue cap in VerifyAndProcessRecoveredSig is computed by full scan on every incoming QSIGREC. Still valid at head 95e160d — code unchanged from b75b568 at src/llmq/signing.cpp:400-408. The latest delta only dropped an unrelated DKG refactor commit; no new signing-side findings. No blockers.

Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents

These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `src/llmq/signing.cpp`:
- [SUGGESTION] src/llmq/signing.cpp:400-408: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC
  VerifyAndProcessRecoveredSig sums list.size() across every entry in pendingRecoveredSigs under cs_pending on each admission. Thanks to the empty-entry pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471), the map is bounded to nodes with in-flight sigs, so in practice this is O(#peers-with-pending) rather than O(everyone-ever) — small in normal operation and not a correctness issue. However, this scan runs hottest exactly during the burst/flood scenario this PR defends against, when many peers can be near their per-node cap and every new QSIGREC forces another O(N) walk under a lock the drain thread and other network threads also need. Maintaining a running counter incremented at the emplace_back on line 416 and decremented at each erase point (drain in CollectPendingRecoveredSigsToVerify line 458, the prune loop on lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Take it or leave it.

thepastaclaw · 2026-07-02T16:02:21Z

+    size_t total_pending{0};
+    for (const auto& node_entry : pendingRecoveredSigs) {
+        total_pending += node_entry.second.size();
+    }
+    if (total_pending >= MAX_PENDING_RECSIGS_TOTAL) {
+        LogPrint(BCLog::LLMQ, "CSigningManager::%s -- global pending recovered sigs cap reached (%d), dropping sig from node=%d\n",
+                 __func__, MAX_PENDING_RECSIGS_TOTAL, from);
+        return;
+    }


🟡 Suggestion: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC

VerifyAndProcessRecoveredSig sums list.size() across every entry in pendingRecoveredSigs under cs_pending on each admission. Thanks to the empty-entry pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471), the map is bounded to nodes with in-flight sigs, so in practice this is O(#peers-with-pending) rather than O(everyone-ever) — small in normal operation and not a correctness issue. However, this scan runs hottest exactly during the burst/flood scenario this PR defends against, when many peers can be near their per-node cap and every new QSIGREC forces another O(N) walk under a lock the drain thread and other network threads also need. Maintaining a running counter incremented at the emplace_back on line 416 and decremented at each erase point (drain in CollectPendingRecoveredSigsToVerify line 458, the prune loop on lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Take it or leave it.

_{source: ['claude', 'codex']}

UdjinM6

utACK 95e160d

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jul 2, 2026

perf: deserialize DKG messages once #7401

Draft

5 tasks

PastaPastaPasta force-pushed the fix/qsigrec-oom branch from b75b568 to 95e160d Compare July 2, 2026 15:47

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

thepastaclaw reviewed Jul 2, 2026

View reviewed changes

thepastaclaw approved these changes Jul 2, 2026

View reviewed changes

UdjinM6 approved these changes Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: bound pending recovered sig queue to prevent remote OOM#7402

fix: bound pending recovered sig queue to prevent remote OOM#7402
PastaPastaPasta wants to merge 1 commit into
dashpay:developfrom
PastaPastaPasta:fix/qsigrec-oom

PastaPastaPasta commented Jul 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

thepastaclaw commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Uh oh!

coderabbitai Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

thepastaclaw left a comment

Uh oh!

thepastaclaw left a comment

Uh oh!

thepastaclaw Jul 2, 2026

Uh oh!

UdjinM6 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

PastaPastaPasta commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Fix

Testing

Uh oh!

github-actions Bot commented Jul 2, 2026

✅ No Merge Conflicts Detected

Uh oh!

thepastaclaw commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

thepastaclaw left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

thepastaclaw Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

UdjinM6 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PastaPastaPasta commented Jul 2, 2026 •

edited

Loading

thepastaclaw commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading