perf: deserialize DKG messages once#7401
Conversation
Adds a focused bench that constructs representative QCONTRIB and QPCOMMITMENT wire payloads and compares the current double-deserialize intake pattern (structural pre-check on a copy at intake + typed deserialize on the DKG worker) against a single-deserialize pattern (intake deserializes once and hands the typed object to the worker). Local numbers on Apple Silicon (release build, -min-time default): DKG_QCONTRIB_DoubleDeserialize ~8.5 ms/op DKG_QCONTRIB_SingleDeserialize ~4.75 ms/op (~1.8x) DKG_QPCOMMITMENT_DoubleDeserialize ~910 us/op DKG_QPCOMMITMENT_SingleDeserialize ~423 us/op (~2.1x) Costs are dominated by BLS G1/G2 point decompression in the vvec and signatures, so a single decode per payload avoids meaningful work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CheckDKGMessageStructure previously deserialized a copy of the pushed DKG payload on the network thread to enforce param-derived structural bounds, then the original bytes were retained on the pending queue and deserialized again on the DKG worker thread. For QCONTRIB the BLS point decompression is the dominant cost, so paying it twice per message is meaningful (see bench: roughly 2x wall-clock for QCONTRIB and QPCOMMITMENT before/after this change). Intake now deserializes the payload once, runs the same param-derived bound check on the typed object, and enqueues an already-typed std::shared_ptr<Message>. The DKG worker no longer deserializes on pop. CDKGPendingMessages is turned into a class template parameterised on the message type so each queue stores its typed message directly. Preserved: - Oversize rejection with peer misbehavior score 100 before any deserialization or retention. - Malformed-payload rejection with score 100 (a deserialize throw or a bound violation triggers the same ban path as before). - Inventory hash: still computed over the raw wire bytes at intake, matching what peers compute for the same payload; AlreadyHave / HasSeen / PeerEraseObjectRequest semantics unchanged. - EnqueueOwn still serializes with SER_NETWORK/PROTOCOL_VERSION and hashes those bytes, matching the old serialize-then-hash form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
✅ Review complete (commit 506a5e4) |
WalkthroughThis PR refactors DKG message handling in Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Peer
participant NetDKG
participant CheckDKGMessageStructure
participant CDKGPendingMessages
participant CDKGSessionHandler
Peer->>NetDKG: send DKG message (vRecv)
NetDKG->>NetDKG: compute hash over raw vRecv
NetDKG->>NetDKG: deserialize once into typed message
NetDKG->>CheckDKGMessageStructure: validate structural bounds
alt malformed
CheckDKGMessageStructure-->>NetDKG: throws
NetDKG->>Peer: ban peer
else valid
CheckDKGMessageStructure-->>NetDKG: ok
NetDKG->>CDKGPendingMessages: PushPendingMessage(typed msg, hash)
CDKGSessionHandler->>CDKGPendingMessages: PopPendingMessages()
CDKGPendingMessages-->>CDKGSessionHandler: typed messages
end
Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/bench/dkg_deserialize.cpp (1)
115-120: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMirror the worker allocation in the double-deserialize benchmark.
The single-deserialize path includes
std::make_shared, but the simulated old worker deserialize uses stack objects. If the removed worker path materialized shared typed messages, usemake_sharedhere too so the comparison measures only the intended deserialize-count delta.Proposed adjustment
- llmq::CDKGContribution qc; - s >> qc; - ankerl::nanobench::doNotOptimizeAway(qc.contributions->blobs.size()); + auto qc = std::make_shared<llmq::CDKGContribution>(); + s >> *qc; + ankerl::nanobench::doNotOptimizeAway(qc->contributions->blobs.size());- llmq::CDKGPrematureCommitment qc; - s >> qc; - ankerl::nanobench::doNotOptimizeAway(qc.validMembers.size()); + auto qc = std::make_shared<llmq::CDKGPrematureCommitment>(); + s >> *qc; + ankerl::nanobench::doNotOptimizeAway(qc->validMembers.size());Also applies to: 151-155
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bench/dkg_deserialize.cpp` around lines 115 - 120, The old worker deserialize benchmark path in the double-deserialize setup is not mirroring the shared-allocation behavior used by the single-deserialize path. Update the worker-side deserialize blocks in the benchmark function(s) that use CDataStream and llmq::CDKGContribution so they materialize the typed message with std::make_shared, matching the removed worker path’s allocation pattern. Keep the deserialize-count comparison focused on the intended cost difference by aligning the allocation behavior between the single and double-deserialize cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llmq/dkgsessionhandler.h`:
- Around line 84-94: In CDKGPendingMessages, the duplicate-hash check currently
happens after incrementing messagesPerNode[from], so repeated duplicates can
consume the peer quota without adding work. Move the seenMessages.emplace(hash)
check ahead of the quota accounting in the same flow, then only increment
messagesPerNode[from] and continue enqueueing when the hash is new; keep the
existing logging and early returns in CDKGPendingMessages::AddMessage (or the
corresponding message-handling method).
---
Nitpick comments:
In `@src/bench/dkg_deserialize.cpp`:
- Around line 115-120: The old worker deserialize benchmark path in the
double-deserialize setup is not mirroring the shared-allocation behavior used by
the single-deserialize path. Update the worker-side deserialize blocks in the
benchmark function(s) that use CDataStream and llmq::CDKGContribution so they
materialize the typed message with std::make_shared, matching the removed worker
path’s allocation pattern. Keep the deserialize-count comparison focused on the
intended cost difference by aligning the allocation behavior between the single
and double-deserialize cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7c05047c-80ac-49ea-b4c5-0035e6ec8d3f
📒 Files selected for processing (5)
src/Makefile.bench.includesrc/bench/dkg_deserialize.cppsrc/llmq/dkgsessionhandler.cppsrc/llmq/dkgsessionhandler.hsrc/llmq/net_dkg.cpp
💤 Files with no reviewable changes (1)
- src/llmq/dkgsessionhandler.cpp
| if (messagesPerNode[from] >= maxMessagesPerNode) { | ||
| // TODO ban? | ||
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from); | ||
| return; | ||
| } | ||
| messagesPerNode[from]++; | ||
|
|
||
| std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret; | ||
| ret.reserve(binaryMessages.size()); | ||
| for (const auto& bm : binaryMessages) { | ||
| auto msg = std::make_shared<Message>(); | ||
| try { | ||
| *bm.second >> *msg; | ||
| } catch (...) { | ||
| msg = nullptr; | ||
| } | ||
| ret.emplace_back(std::make_pair(bm.first, std::move(msg))); | ||
| if (!seenMessages.emplace(hash).second) { | ||
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, | ||
| hash.ToString(), from); | ||
| return; |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Check duplicates before charging the peer quota.
Line 89 increments messagesPerNode[from] before Line 91 rejects duplicate hashes, so repeated duplicates can exhaust that peer’s quota without adding pending work. Move the seenMessages.emplace(hash) check before the quota increment/enqueue accounting.
Proposed fix
- if (messagesPerNode[from] >= maxMessagesPerNode) {
- // TODO ban?
- LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
- return;
- }
- messagesPerNode[from]++;
-
if (!seenMessages.emplace(hash).second) {
LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__,
hash.ToString(), from);
return;
}
+
+ if (messagesPerNode[from] >= maxMessagesPerNode) {
+ // TODO ban?
+ LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from);
+ return;
+ }
+ messagesPerNode[from]++;
pendingMessages.emplace_back(from, std::move(msg));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (messagesPerNode[from] >= maxMessagesPerNode) { | |
| // TODO ban? | |
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from); | |
| return; | |
| } | |
| messagesPerNode[from]++; | |
| std::vector<std::pair<NodeId, std::shared_ptr<Message>>> ret; | |
| ret.reserve(binaryMessages.size()); | |
| for (const auto& bm : binaryMessages) { | |
| auto msg = std::make_shared<Message>(); | |
| try { | |
| *bm.second >> *msg; | |
| } catch (...) { | |
| msg = nullptr; | |
| } | |
| ret.emplace_back(std::make_pair(bm.first, std::move(msg))); | |
| if (!seenMessages.emplace(hash).second) { | |
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, | |
| hash.ToString(), from); | |
| return; | |
| if (!seenMessages.emplace(hash).second) { | |
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- already seen %s, peer=%d\n", __func__, | |
| hash.ToString(), from); | |
| return; | |
| } | |
| if (messagesPerNode[from] >= maxMessagesPerNode) { | |
| // TODO ban? | |
| LogPrint(BCLog::LLMQ_DKG, "CDKGPendingMessages::%s -- too many messages, peer=%d\n", __func__, from); | |
| return; | |
| } | |
| messagesPerNode[from]++; | |
| pendingMessages.emplace_back(from, std::move(msg)); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llmq/dkgsessionhandler.h` around lines 84 - 94, In CDKGPendingMessages,
the duplicate-hash check currently happens after incrementing
messagesPerNode[from], so repeated duplicates can consume the peer quota without
adding work. Move the seenMessages.emplace(hash) check ahead of the quota
accounting in the same flow, then only increment messagesPerNode[from] and
continue enqueueing when the hash is new; keep the existing logging and early
returns in CDKGPendingMessages::AddMessage (or the corresponding
message-handling method).
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Clean, well-scoped perf refactor eliminating a redundant DKG deserialize. All intake invariants (oversize ban, malformed ban, hash-over-wire-bytes, per-node/duplicate suppression, own-message enqueue form) are preserved and independently verified by multiple reviewers. Only remaining suggestion is a Dash-specific lint tracking update for the new benchmark file.
Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history), gpt-5.5[high] (dash-core-commit-history); verifier: opus; specialists: dash-core-commit-history
🟡 1 suggestion
Findings not posted inline (1)
These findings could not be anchored to the current diff, but they are still part of this review.
- [SUGGESTION]
test/util/data/non-backported.txt:4: Add the new Dash-specific benchmark to non-backported tracking — This PR addssrc/bench/dkg_deserialize.cpp, a Dash-specific benchmark that is not matched by the existingsrc/bench/bls*.cppglob.test/lint/lint-cppcheck-dash.py(line 69-73) builds its Dash-specific file list exclusively from patterns in this file viagit ls-files, so the new benchmark will be silently skipped by that extra lint coverage until it is added. Add a matching entry to keep Dash-specific bench files under cppcheck.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `test/util/data/non-backported.txt`:
- [SUGGESTION] test/util/data/non-backported.txt:4: Add the new Dash-specific benchmark to non-backported tracking
This PR adds `src/bench/dkg_deserialize.cpp`, a Dash-specific benchmark that is not matched by the existing `src/bench/bls*.cpp` glob. `test/lint/lint-cppcheck-dash.py` (line 69-73) builds its Dash-specific file list exclusively from patterns in this file via `git ls-files`, so the new benchmark will be silently skipped by that extra lint coverage until it is added. Add a matching entry to keep Dash-specific bench files under cppcheck.
Duplicate hashes previously incremented messagesPerNode before the seen-set check, so a peer that resent the same message could exhaust its quota with duplicates and starve legitimate new messages. Check the seen-set first and only charge the quota after we know we will actually enqueue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ialize path The old DKG worker (PopAndDeserializeMessages) allocated the typed object via std::make_shared<Message>() before deserializing into it. Mirror that shape in the double-deserialize benchmark's worker block so the before/after comparison isolates the deserialize cost rather than also picking up a stack-vs-heap allocation delta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
Two suggestions remain against head 029fcc2. New finding in the latest delta: the dedupe-before-quota reorder correctly fixes duplicate resend quota charging, but a quota-exceeded drop now records the hash in seenMessages, so the same message can be poisoned for later legitimate delivery paths in the round. Carried-forward prior finding: src/bench/dkg_deserialize.cpp is still missing from test/util/data/non-backported.txt, so the extra cppcheck-dash lint pass will silently skip the new Dash-specific benchmark. No blocking issues; the benchmark allocation-shape fix is otherwise correct and well-scoped.
Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history
🟡 2 suggestion(s)
Findings not posted inline (1)
These findings could not be anchored to the current diff, but they are still part of this review.
- [SUGGESTION]
test/util/data/non-backported.txt:4: Track new Dash-specific benchmark in non-backported.txt — This PR addssrc/bench/dkg_deserialize.cpp, a Dash-specific benchmark that is not matched by the existingsrc/bench/bls*.cppglob on line 4.test/lint/lint-cppcheck-dash.pybuilds its Dash-specific file list exclusively from the patterns in this file viagit ls-files, so the new benchmark will be silently excluded from the extra cppcheck-dash coverage until it is listed here. Carried forward from the prior automated review at c7e3cad — still unresolved at head 029fcc2 (the tracked file was not touched by the latest push).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
New finding in the latest delta:
In `src/llmq/dkgsessionhandler.h`:
- [SUGGESTION] src/llmq/dkgsessionhandler.h:82-100: Quota-exceeded drop still poisons the message hash in seenMessages
The reordered `PushPendingMessage` inserts `hash` into `seenMessages` at line 86 via `emplace(hash).second` before the quota check at line 92. If a sender is already at `maxMessagesPerNode` distinct-message quota, the function returns after having already committed the hash to `seenMessages`. Because `seenMessages` is keyed only by hash (not by sender), the same message subsequently delivered from any other peer — or via `AlreadyHave`/`HasSeen` paths — is now treated as "already seen" for the rest of the round and can never reach `pendingMessages`.
This is a narrower variant of the bug fixed in e8d9b6d803: that commit correctly stopped duplicate resends from silently consuming a peer's quota, but the current ordering trades it for a case where a quota-exceeded drop poisons the hash for other, legitimate delivery paths. DKG already tolerates a single peer's messages being lost, so the operational impact is small, but the current shape unnecessarily couples the two intake failure modes. Splitting the seen-check (read-only) from the seen-insert (commit) so the hash is only recorded once the message is actually accepted preserves both invariants.
Carried-forward prior finding:
In `test/util/data/non-backported.txt`:
- [SUGGESTION] test/util/data/non-backported.txt:4: Track new Dash-specific benchmark in non-backported.txt
This PR adds `src/bench/dkg_deserialize.cpp`, a Dash-specific benchmark that is not matched by the existing `src/bench/bls*.cpp` glob on line 4. `test/lint/lint-cppcheck-dash.py` builds its Dash-specific file list exclusively from the patterns in this file via `git ls-files`, so the new benchmark will be silently excluded from the extra cppcheck-dash coverage until it is listed here. Carried forward from the prior automated review at c7e3cad4 — still unresolved at head 029fcc228b (the tracked file was not touched by the latest push).
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Both prior findings are cleanly resolved by the latest delta: PushPendingMessage now only marks a hash as seen after the per-node quota check passes (no more quota-drop poisoning), and src/bench/dkg_deserialize.cpp is now tracked in non-backported.txt. No new issues in the delta; the full PR stack (single-deserialize refactor, benchmark, dedupe/quota fix) preserves the documented intake invariants. Commit-history agents suggested squashing fixup commits, but Dash merges PRs without squashing and these are low-value stylistic notes that don't warrant blocking.
Source: reviewers: opus (general), claude-sonnet-5 (general), gpt-5.5[high] (general, failed), opus (dash-core-commit-history), claude-sonnet-5 (dash-core-commit-history), gpt-5.5[high] (dash-core-commit-history, failed); verifier: opus; specialists: dash-core-commit-history
|
CI note: the current Build source failures are the same depends-cache infrastructure issue tracked in #7403, not a failure in this PR's DKG changes. The source jobs are exiting before compilation at actions/cache/restore with fail-on-cache-miss for freshly built depends cache keys, for example run 28630838726 / linux64-build job 84908995980. No branch change is appropriate for this PR; the workflow/cache fix belongs in #7403. |
perf: deserialize DKG messages once
Issue being fixed or feature implemented
DKG network message intake currently deserializes each accepted payload twice:
message structure and bounds
queued message
These messages contain BLS objects, so the redundant pass repeats non-trivial
decompression work on a hot network path during DKG rounds.
What was done?
justifications, and premature commitments.
structural checks on the typed object, and queue that validated object for
the worker.
wire bytes.
single-deserialize costs for
QCONTRIBandQPCOMMITMENTpayloads.How Has This Been Tested?
Built on macOS with:
./autogen.sh ./configure --without-gui --disable-tests --enable-bench \ --disable-wallet --without-miniupnpc --without-natpmp make -C src -j"$(( $(sysctl -n hw.ncpu) - 2 ))" bench/bench_dash dashdRan the new benchmark:
./src/bench/bench_dash -filter='DKG_(QCONTRIB|QPCOMMITMENT)_(Double|Single)Deserialize'Results:
This shows roughly a 2x reduction in the deserialization/decompression portion
of the DKG message intake path for representative payloads.
Additional validation:
I also ran an independent code review pass against
upstream/develop..HEAD;the first pass found an avoidable heavy header dependency, which was fixed, and
the second pass recommended shipping.
Breaking Changes
None.
Checklist
code-owners and collaborators only)