Fix race condition in QC assembly causing dual finalizer vote issue#223
Open
Fix race condition in QC assembly causing dual finalizer vote issue#223
Conversation
…stency 'Block includes an invalid QC (3030015) qc 109 contains a dual finalizer PUB_BLS_xx which does not vote the same on active and pending policies' During finalizer policy transitions, a "dual finalizer" (same BLS key in both active and pending policies) must vote identically on both policies. A race condition in aggregate_vote() caused it to add the vote to the active policy first (under its per-policy mutex), then the pending policy (under a separate mutex). Between these two additions, get_best_qc() on the block production thread could snapshot active (with the vote) and pending (without the vote), producing an invalid QC that validators reject with: "qc contains a dual finalizer which does not vote the same on active and pending policies". This is a bug inherited from Spring (AntelopeIO/spring) where the same aggregate_vote/get_best_qc interleaving exists. Changes: - Add outer mutex to aggregating_qc_t ensuring aggregate_vote writes and get_best_qc/set_received_qc reads are atomic across both policies - Restructure aggregate_vote into find/verify/add phases: expensive BLS signature verification stays outside the lock, vote additions to both policies happen atomically under the outer mutex - Fix duplicate detection for dual finalizers: only return duplicate when the vote exists in ALL applicable policies, not just the active one - Move vote dedup cache update (add_vote_id) and send_buffer creation before async post. This is off the main thread and will add it to the dedup cache sooner.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CI/CD error: https://github.com/Wire-Network/wire-sysio/actions/runs/22360422003/job/64735182252
Block includes an invalid QC (3030015) qc 109 contains a dual finalizer PUB_BLS_xx which does not vote the same on active and pending policiesDuring finalizer policy transitions, a "dual finalizer" (same BLS key in both active and pending policies) must vote identically on both policies. A race condition in
aggregate_vote()caused it to add the vote to the active policy first (under its per-policy mutex), then the pending policy (under a separate mutex). Between these two additions,get_best_qc()on the block production thread could snapshot active (with the vote) and pending (without the vote), producing an invalid QC that validators reject with: "qc contains a dual finalizer which does not vote the same on active and pending policies".This is a bug inherited from Spring (AntelopeIO/spring) where the same
aggregate_vote/get_best_qcinterleaving exists.Changes:
aggregating_qc_tensuringaggregate_votewrites andget_best_qc/set_received_qcreads are atomic across both policiesaggregate_voteinto find/verify/add phases: expensive BLS signature verification stays outside the lock, vote additions to both policies happen atomically under the outer mutexNote this is not a security issue as the invalid block is rejected by other proposers and finalizers and is very unlikely in practice.