Skip to content

Comments

Fix race condition in QC assembly causing dual finalizer vote issue#223

Open
heifner wants to merge 1 commit intomasterfrom
feature/fix-dual-finalizer-error
Open

Fix race condition in QC assembly causing dual finalizer vote issue#223
heifner wants to merge 1 commit intomasterfrom
feature/fix-dual-finalizer-error

Conversation

@heifner
Copy link
Contributor

@heifner heifner commented Feb 25, 2026

CI/CD error: https://github.com/Wire-Network/wire-sysio/actions/runs/22360422003/job/64735182252
Block includes an invalid QC (3030015) qc 109 contains a dual finalizer PUB_BLS_xx which does not vote the same on active and pending policies

During finalizer policy transitions, a "dual finalizer" (same BLS key in both active and pending policies) must vote identically on both policies. A race condition in aggregate_vote() caused it to add the vote to the active policy first (under its per-policy mutex), then the pending policy (under a separate mutex). Between these two additions, get_best_qc() on the block production thread could snapshot active (with the vote) and pending (without the vote), producing an invalid QC that validators reject with: "qc contains a dual finalizer which does not vote the same on active and pending policies".

This is a bug inherited from Spring (AntelopeIO/spring) where the same aggregate_vote/get_best_qc interleaving exists.

Changes:

  • Add outer mutex to aggregating_qc_t ensuring aggregate_vote writes and get_best_qc/set_received_qc reads are atomic across both policies
  • Restructure aggregate_vote into find/verify/add phases: expensive BLS signature verification stays outside the lock, vote additions to both policies happen atomically under the outer mutex
  • Fix duplicate detection for dual finalizers: only return duplicate when the vote exists in both applicable policies, not just the active one
  • Move vote dedup cache update (add_vote_id) and send_buffer creation before async post. This is off the main thread and will add it to the dedup cache sooner.

Note this is not a security issue as the invalid block is rejected by other proposers and finalizers and is very unlikely in practice.

…stency

'Block includes an invalid QC (3030015) qc 109 contains a dual finalizer PUB_BLS_xx which does not vote the same on active and pending policies'

During finalizer policy transitions, a "dual finalizer" (same BLS key in
both active and pending policies) must vote identically on both policies.
A race condition in aggregate_vote() caused it to add the vote to the
active policy first (under its per-policy mutex), then the pending policy
(under a separate mutex). Between these two additions, get_best_qc() on
the block production thread could snapshot active (with the vote) and
pending (without the vote), producing an invalid QC that validators
reject with: "qc contains a dual finalizer which does not vote the same
on active and pending policies".

This is a bug inherited from Spring (AntelopeIO/spring) where the same
aggregate_vote/get_best_qc interleaving exists.

Changes:
- Add outer mutex to aggregating_qc_t ensuring aggregate_vote writes and
  get_best_qc/set_received_qc reads are atomic across both policies
- Restructure aggregate_vote into find/verify/add phases: expensive BLS
  signature verification stays outside the lock, vote additions to both
  policies happen atomically under the outer mutex
- Fix duplicate detection for dual finalizers: only return duplicate when
  the vote exists in ALL applicable policies, not just the active one
- Move vote dedup cache update (add_vote_id) and send_buffer creation
  before async post. This is off the main thread and will add it to the
  dedup cache sooner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant