Skip to content

fix: bound pending recovered sig queue to prevent remote OOM#7402

Open
PastaPastaPasta wants to merge 1 commit into
dashpay:developfrom
PastaPastaPasta:fix/qsigrec-oom
Open

fix: bound pending recovered sig queue to prevent remote OOM#7402
PastaPastaPasta wants to merge 1 commit into
dashpay:developfrom
PastaPastaPasta:fix/qsigrec-oom

Conversation

@PastaPastaPasta

@PastaPastaPasta PastaPastaPasta commented Jul 2, 2026

Copy link
Copy Markdown
Member

Issue

Incoming QSIGREC (recovered signature) P2P messages are deserialized and appended to the per-node queue CSigningManager::pendingRecoveredSigs after only cheap gating checks (quorum exists/active, dedup by full hash). Actual verification is expensive (BLS) and is deferred to a single worker thread that drains only ~32 unique sign-hash sessions per cycle, whereas ingestion is crypto-free and happens on the network threads in parallel across connections.

Two problems compound:

  • No size bound. pendingRecoveredSigs had no per-node or global cap, and the dedup check keys on the full message hash (which includes the signature), so it does not constrain a peer sending many distinct messages. Ingestion structurally outruns the drain.
  • No cleanup on ban/disconnect. RemoveBannedNodeStates only cleans the separate sig-shares subsystem (CSigSharesManager::nodeStates); nothing purged pendingRecoveredSigs for banned or disconnected peers.

Net effect: a peer (or several) can grow the queue faster than it drains, increasing RSS until the node is OOM-killed — a remote, unauthenticated memory-exhaustion DoS. Highest impact against masternodes, which accept many inbound connections and run the LLMQ signing behind ChainLocks / InstantSend.

Fix

  • Bound the queue in VerifyAndProcessRecoveredSig with a per-node cap (MAX_PENDING_RECSIGS_PER_NODE = 1000) and a global cap (MAX_PENDING_RECSIGS_TOTAL = 10000). Over-cap messages are dropped silently, without a misbehaviour score — verification is single-threaded, so an honest peer can legitimately outrun the drain during a burst, and this is queue-depth backpressure rather than a protocol violation (contrast the per-message structural caps for sig-shares, which do ban).
  • Purge banned peers' queues via a new CSigningManager::RemoveNodesIf(predicate) (mirroring CSigSharesManager::RemoveNodesIf), invoked from WorkThreadSigning's periodic cleanup using PeerIsBanned.
  • Prune drained (empty) node entries during collection, so the global-cap scan stays cheap and disconnected nodes' queues are reclaimed once drained without waiting for a ban.

With these caps the attacker-controllable pending queue is bounded to roughly 8 MB worst case (10,000 entries × ~750 bytes resident per entry; the in-memory CRecoveredSig is 696 bytes vs. 193 on the wire, dominated by CBLSLazySignature). Previously it was unbounded.

Testing

  • make check targets build clean; full dashd builds and links.
  • feature_llmq_signing.py passes — normal recovered-sig relay across masternodes is unaffected by the caps.
  • lint-whitespace, lint-includes, lint-circular-dependencies clean.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@thepastaclaw

thepastaclaw commented Jul 2, 2026

Copy link
Copy Markdown

✅ Review complete (commit 95e160d)

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3f344d30-0858-4a12-86f9-c57f98797ac7

📥 Commits

Reviewing files that changed from the base of the PR and between b75b568 and 95e160d.

📒 Files selected for processing (3)
  • src/llmq/net_signing.cpp
  • src/llmq/signing.cpp
  • src/llmq/signing.h
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/llmq/net_signing.cpp
  • src/llmq/signing.h
  • src/llmq/signing.cpp

Walkthrough

This PR adds backpressure controls to the pending recovered-signature queue in the signing manager. Two new caps, a global total and a per-node limit, are enforced when enqueuing recovered sigs, dropping over-limit entries with a log message. A new RemoveNodesIf method removes queued entries for nodes matching a predicate, and empty per-node queue entries are pruned during pending-sig collection. The worker thread's periodic cleanup now also removes pending recovered sigs belonging to banned peers.

Estimated code review effort: 2 (Simple) | ~15 minutes

Sequence Diagram(s)

sequenceDiagram
  participant WorkThreadSigning
  participant PeerManager
  participant CSigningManager
  participant pendingRecoveredSigs

  loop periodic cleanup
    WorkThreadSigning->>CSigningManager: Cleanup()
    WorkThreadSigning->>PeerManager: PeerIsBanned(node_id)?
    WorkThreadSigning->>CSigningManager: RemoveNodesIf(predicate)
    CSigningManager->>pendingRecoveredSigs: erase entries for banned nodes
  end

  Note over CSigningManager: VerifyAndProcessRecoveredSig
  CSigningManager->>pendingRecoveredSigs: compute total & per-node counts
  alt caps exceeded
    CSigningManager-->>CSigningManager: log and drop recovered sig
  else within caps
    CSigningManager->>pendingRecoveredSigs: enqueue recovered sig
  end
Loading

Suggested reviewers: UdjinM6

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: bounding the pending recovered-sig queue to prevent remote OOM.
Description check ✅ Passed The description is directly related to the queue-bounding and cleanup changes in the patch.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/llmq/signing.cpp (2)

462-472: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Same std::erase_if opportunity here.

Mirrors the loop in RemoveNodesIf; can be simplified the same way.

♻️ Proposed refactor
-        for (auto it = pendingRecoveredSigs.begin(); it != pendingRecoveredSigs.end();) {
-            if (it->second.empty()) {
-                it = pendingRecoveredSigs.erase(it);
-            } else {
-                ++it;
-            }
-        }
+        std::erase_if(pendingRecoveredSigs, [](const auto& entry) { return entry.second.empty(); });
As per coding guidelines, "Dash Core implementation must be written in C++20, requiring at least Clang 16 or GCC 11.1" (`src/**/*.{cpp,h,hpp,cc}`).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/signing.cpp` around lines 462 - 472, The manual erase loop over
pendingRecoveredSigs should be refactored to use std::erase_if, matching the
simpler pattern used in RemoveNodesIf. Update the cleanup logic in signing.cpp
so the pruning of empty entries is expressed with std::erase_if on
pendingRecoveredSigs, keeping the same behavior while reducing boilerplate and
aligning with the C++20 style guidelines.

Source: Coding guidelines


419-429: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use std::erase_if for this cleanup. Replace the manual iterator loop with std::erase_if(pendingRecoveredSigs, ...) to keep the code shorter and less error-prone.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/signing.cpp` around lines 419 - 429, Replace the manual iterator
cleanup in CSigningManager::RemoveNodesIf with std::erase_if on
pendingRecoveredSigs, using the existing predicate against the map key. Keep the
cs_pending lock in place and preserve the same removal behavior while
simplifying the implementation and avoiding the explicit erase/advance loop.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 80-89: In CDKGPendingMessages, the duplicate check in the message
handling flow should happen before consuming the per-node quota. Update the
logic around messagesPerNode[from] and seenMessages.emplace(hash) so duplicate
hashes are dropped first, and only increment messagesPerNode[from] after a
message is confirmed new and will be queued. Keep the existing behavior in the
same method, but reorder the checks to avoid counting retransmits against the
peer.

In `@src/llmq/net_dkg.cpp`:
- Around line 108-113: The deserialization path in the DKG message handling
currently accepts payloads even when unread trailing bytes remain after vRecv >>
*msg, which can cause duplicate suppression to miss semantically identical
messages. Update the receive/parse logic in the DKG message handler that calls
vRecv >> *msg and then ValidateDKGMessageStructure so it verifies the stream was
fully consumed after deserialization, and reject the message by returning
nullptr if any bytes remain or parsing leaves extra data.

---

Nitpick comments:
In `@src/llmq/signing.cpp`:
- Around line 462-472: The manual erase loop over pendingRecoveredSigs should be
refactored to use std::erase_if, matching the simpler pattern used in
RemoveNodesIf. Update the cleanup logic in signing.cpp so the pruning of empty
entries is expressed with std::erase_if on pendingRecoveredSigs, keeping the
same behavior while reducing boilerplate and aligning with the C++20 style
guidelines.
- Around line 419-429: Replace the manual iterator cleanup in
CSigningManager::RemoveNodesIf with std::erase_if on pendingRecoveredSigs, using
the existing predicate against the map key. Keep the cs_pending lock in place
and preserve the same removal behavior while simplifying the implementation and
avoiding the explicit erase/advance loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 191c2890-110d-4b9b-bee7-8d22efff1e0f

📥 Commits

Reviewing files that changed from the base of the PR and between 59d36a9 and b75b568.

📒 Files selected for processing (6)
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_signing.cpp
  • src/llmq/signing.cpp
  • src/llmq/signing.h
💤 Files with no reviewable changes (1)
  • src/llmq/dkgsessionhandler.cpp

Comment thread src/llmq/dkgsessionhandler.h Outdated
Comment on lines +80 to +89
if (messagesPerNode[from] >= maxMessagesPerNode) {
// TODO ban?
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
return;
}
messagesPerNode[from]++;

std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret;
ret.reserve(binaryMessages.size());
for (const auto& bm : binaryMessages) {
auto msg = std::make_shared<Message>();
try {
*bm.second >> *msg;
} catch (...) {
msg = nullptr;
}
ret.emplace_back(std::make_pair(bm.first, std::move(msg)));
if (!seenMessages.emplace(hash).second) {
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Check duplicates before consuming the per-node quota.

Line 85 increments messagesPerNode[from] before Line 87 drops duplicate hashes, so duplicate retransmits permanently consume that peer’s quota until Clear() even though nothing was queued.

Proposed fix
-        if (messagesPerNode[from] >= maxMessagesPerNode) {
-            // TODO ban?
-            LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
-            return;
-        }
-        messagesPerNode[from]++;
-
         if (!seenMessages.emplace(hash).second) {
             LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);
             return;
         }
+
+        if (messagesPerNode[from] >= maxMessagesPerNode) {
+            // TODO ban?
+            seenMessages.erase(hash);
+            LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
+            return;
+        }
+        messagesPerNode[from]++;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (messagesPerNode[from] >= maxMessagesPerNode) {
// TODO ban?
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
return;
}
messagesPerNode[from]++;
std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret;
ret.reserve(binaryMessages.size());
for (const auto& bm : binaryMessages) {
auto msg = std::make_shared<Message>();
try {
*bm.second >> *msg;
} catch (...) {
msg = nullptr;
}
ret.emplace_back(std::make_pair(bm.first, std::move(msg)));
if (!seenMessages.emplace(hash).second) {
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);
return;
if (!seenMessages.emplace(hash).second) {
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, hash.ToString(), from);
return;
}
if (messagesPerNode[from] >= maxMessagesPerNode) {
// TODO ban?
seenMessages.erase(hash);
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
return;
}
messagesPerNode[from]++;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/dkgsessionhandler.h` around lines 80 - 89, In CDKGPendingMessages,
the duplicate check in the message handling flow should happen before consuming
the per-node quota. Update the logic around messagesPerNode[from] and
seenMessages.emplace(hash) so duplicate hashes are dropped first, and only
increment messagesPerNode[from] after a message is confirmed new and will be
queued. Keep the existing behavior in the same method, but reorder the checks to
avoid counting retransmits against the peer.

Comment thread src/llmq/net_dkg.cpp Outdated
Comment on lines +108 to +113
try {
CDataStream s(vRecv); // copy; deserialization does not advance the caller's stream
if (msg_type == NetMsgType::QCONTRIB) {
CDKGContribution qc;
s >> qc;
return qc.vvec != nullptr && qc.vvec->size() == threshold &&
qc.contributions != nullptr && qc.contributions->blobs.size() <= size;
} else if (msg_type == NetMsgType::QCOMPLAINT) {
CDKGComplaint qc;
s >> qc;
return qc.badMembers.size() == qc.complainForMembers.size() &&
qc.badMembers.size() <= size;
} else if (msg_type == NetMsgType::QJUSTIFICATION) {
CDKGJustification qj;
s >> qj;
return qj.contributions.size() <= size;
} else if (msg_type == NetMsgType::QPCOMMITMENT) {
CDKGPrematureCommitment qc;
s >> qc;
return qc.validMembers.size() <= size;
}
return false;
vRecv >> *msg;
} catch (const std::exception&) {
return false;
return nullptr;
}
if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Reject trailing bytes after deserialization.

vRecv >> *msg can leave unread bytes that are still included in the wire hash. Accepting those payloads lets semantically identical messages bypass duplicate suppression with different trailing garbage.

Proposed fix
     try {
         vRecv >> *msg;
     } catch (const std::exception&) {
         return nullptr;
     }
+    if (!vRecv.empty()) return nullptr;
     if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
CDataStream s(vRecv); // copy; deserialization does not advance the caller's stream
if (msg_type == NetMsgType::QCONTRIB) {
CDKGContribution qc;
s >> qc;
return qc.vvec != nullptr && qc.vvec->size() == threshold &&
qc.contributions != nullptr && qc.contributions->blobs.size() <= size;
} else if (msg_type == NetMsgType::QCOMPLAINT) {
CDKGComplaint qc;
s >> qc;
return qc.badMembers.size() == qc.complainForMembers.size() &&
qc.badMembers.size() <= size;
} else if (msg_type == NetMsgType::QJUSTIFICATION) {
CDKGJustification qj;
s >> qj;
return qj.contributions.size() <= size;
} else if (msg_type == NetMsgType::QPCOMMITMENT) {
CDKGPrematureCommitment qc;
s >> qc;
return qc.validMembers.size() <= size;
}
return false;
vRecv >> *msg;
} catch (const std::exception&) {
return false;
return nullptr;
}
if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;
try {
vRecv >> *msg;
} catch (const std::exception&) {
return nullptr;
}
if (!vRecv.empty()) return nullptr;
if (!ValidateDKGMessageStructure(*msg, size, threshold)) return nullptr;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 108 - 113, The deserialization path in the
DKG message handling currently accepts payloads even when unread trailing bytes
remain after vRecv >> *msg, which can cause duplicate suppression to miss
semantically identical messages. Update the receive/parse logic in the DKG
message handler that calls vRecv >> *msg and then ValidateDKGMessageStructure so
it verifies the stream was fully consumed after deserialization, and reject the
message by returning nullptr if any bytes remain or parsing leaves extra data.

QSIGREC messages were deserialized and appended to CSigningManager::pendingRecoveredSigs after only cheap gating checks (quorum exists/active, dedup by hash). Verification is deferred to a single worker thread draining ~32 unique sign-hash sessions per cycle, while ingestion is unbounded and crypto-free on the network threads. The queue had no size cap and was never purged for banned/disconnected peers (RemoveBannedNodeStates only cleans the separate sig-shares subsystem), so a peer could enqueue faster than the queue drains and grow memory until the node is OOM-killed — a remote, unauthenticated DoS, worst against masternodes.

Bound the queue with per-node (1000) and global (10000) caps, dropping over-cap messages silently (an honest peer can legitimately outrun the single-threaded drain during a burst, so no misbehaviour is scored). Add CSigningManager::RemoveNodesIf to purge banned peers' queues, invoked from WorkThreadSigning's periodic cleanup. Prune drained (empty) node entries during collection so the global-cap scan stays cheap and disconnected nodes' queues are reclaimed once drained.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95e160d20f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/llmq/net_signing.cpp
// Drop pending recovered sigs queued by banned peers so a flood's backlog does not
// persist after the peer is banned (RemoveBannedNodeStates only cleans the sig-shares
// subsystem, not m_sig_manager's pending recovered sigs).
m_sig_manager.RemoveNodesIf([this](NodeId node_id) { return m_peer_manager->PeerIsBanned(node_id); });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Purge disconnected peer queues before enforcing the cap

When a flooding peer disconnects before this 5s cleanup runs, this predicate never matches because PeerIsBanned delegates to PeerManagerImpl::IsBanned, which returns false once GetPeerRef(node_id) is null after FinalizeNode removes the peer. Those disconnected peers' pendingRecoveredSigs entries still count toward MAX_PENDING_RECSIGS_TOTAL, so an attacker can fill the new global cap with stale node IDs and cause subsequent honest QSIGRECs to be dropped until the expensive verifier drains them.

Useful? React with 👍 / 👎.

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Two well-scoped LLMQ networking changes: a DoS fix bounding CSigningManager::pendingRecoveredSigs with per-node/global caps plus pruning of drained entries, and a refactor templating CDKGPendingMessages on message type so DKG payloads deserialize exactly once on the p2p thread while preserving inventory-hash bytes and misbehaviour semantics. Both changes look correct. One convergent perf suggestion carried forward.

Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history, failed), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `src/llmq/signing.cpp`:
- [SUGGESTION] src/llmq/signing.cpp:400-408: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC
  VerifyAndProcessRecoveredSig sums list.size() over every entry in pendingRecoveredSigs under cs_pending on each admission (lines 400-403). The pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471) keeps the map bounded to nodes with in-flight sigs, so this is O(#active peers with pending) rather than O(everyone-ever) — small in normal operation. However, this path runs hottest exactly during the burst/flood scenario the PR is defending against, when many peers can be near their per-node cap and every new message forces another scan under the shared lock also needed by the drain thread. A running counter incremented in the emplace_back on line 416 and decremented at the erase points (drain in CollectPendingRecoveredSigsToVerify line 458, prune loop lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Not a correctness issue — take it or leave it.

Inline submission via scripts/review_poster.py hit GitHub HTTP 422, so I posted the same verified review as a body-only COMMENT.

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Single carried-forward suggestion from the prior review: the global pending-queue cap in VerifyAndProcessRecoveredSig is computed by full scan on every incoming QSIGREC. Still valid at head 95e160d — code unchanged from b75b568 at src/llmq/signing.cpp:400-408. The latest delta only dropped an unrelated DKG refactor commit; no new signing-side findings. No blockers.

Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `src/llmq/signing.cpp`:
- [SUGGESTION] src/llmq/signing.cpp:400-408: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC
  VerifyAndProcessRecoveredSig sums list.size() across every entry in pendingRecoveredSigs under cs_pending on each admission. Thanks to the empty-entry pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471), the map is bounded to nodes with in-flight sigs, so in practice this is O(#peers-with-pending) rather than O(everyone-ever) — small in normal operation and not a correctness issue. However, this scan runs hottest exactly during the burst/flood scenario this PR defends against, when many peers can be near their per-node cap and every new QSIGREC forces another O(N) walk under a lock the drain thread and other network threads also need. Maintaining a running counter incremented at the emplace_back on line 416 and decremented at each erase point (drain in CollectPendingRecoveredSigsToVerify line 458, the prune loop on lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Take it or leave it.

Comment thread src/llmq/signing.cpp
Comment on lines +400 to +408
size_t total_pending{0};
for (const auto& node_entry : pendingRecoveredSigs) {
total_pending += node_entry.second.size();
}
if (total_pending >= MAX_PENDING_RECSIGS_TOTAL) {
LogPrint(BCLog::LLMQ, "CSigningManager::%s -- global pending recovered sigs cap reached (%d), dropping sig from node=%d\n",
__func__, MAX_PENDING_RECSIGS_TOTAL, from);
return;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Global pending-queue cap is recomputed by full scan on every incoming QSIGREC

VerifyAndProcessRecoveredSig sums list.size() across every entry in pendingRecoveredSigs under cs_pending on each admission. Thanks to the empty-entry pruning added in CollectPendingRecoveredSigsToVerify (lines 465-471), the map is bounded to nodes with in-flight sigs, so in practice this is O(#peers-with-pending) rather than O(everyone-ever) — small in normal operation and not a correctness issue. However, this scan runs hottest exactly during the burst/flood scenario this PR defends against, when many peers can be near their per-node cap and every new QSIGREC forces another O(N) walk under a lock the drain thread and other network threads also need. Maintaining a running counter incremented at the emplace_back on line 416 and decremented at each erase point (drain in CollectPendingRecoveredSigsToVerify line 458, the prune loop on lines 465-471, and RemoveNodesIf lines 422-428) would make the global check O(1) with no behavioral change. Take it or leave it.

source: ['claude', 'codex']

@UdjinM6 UdjinM6 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 95e160d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants