Reject snapshot-less join against primary holding a more recent snapshot by Copilot · Pull Request #7844 · microsoft/CCF

Copilot · 2026-04-28T10:58:43Z

The primary should express an opinion about what snapshot is acceptable. Currently this opinion is "at least my startup_seqno".
To fix #7835 we should have a two step system.
On the first try the primary should require the joiner to have a snapshot which is at its latest snapshot number.
On subsequent tries, that primary should fall back to the safety constraint of greater than startup_seqno.

Copilot generated description:

A joiner with startup_seqno == 0 was accepted by an original (also startup_seqno == 0) primary even when that primary held committed snapshots, causing the joiner to replay the entire ledger.

Changes

Join request: new optional retry_count field on JoinNetworkNodeToNode::In, incremented by the joiner each time a StartupSeqnoIsOld response triggers a snapshot fetch + retry.
Primary-side check (node_frontend.h accept handler):
- retry_count == 0: require startup_seqno >= max(this_startup_seqno, latest_on_disk_snapshot_seqno). The max keeps the existing behaviour when this node holds no snapshot.
- retry_count > 0: fall back to the prior startup_seqno >= this_startup_seqno check, so the joiner doesn't chase a snapshot committed mid-fetch.
Latest snapshot source: read from disk via NodeConfigurationSubsystem + snapshots::find_latest_committed_snapshot_in_directory (i.e. what the joiner could actually fetch from this node), not from the SNAPSHOT_STATUS KV table.
Joiner (NodeState): tracks join_retry_count, bumped where the existing StartupSeqnoIsOld + fetch_recent_snapshot path is taken, and sent on the next request.
Test: run_join_no_snapshot_against_original_primary in tests/reconfiguration.py starts a single-node network, generates a committed snapshot, then asserts:
- join with from_snapshot=False, fetch_recent_snapshot=False is rejected with StartupSeqnoIsOld (previously silently replayed the ledger);
- join with from_snapshot=False, fetch_recent_snapshot=True succeeds and the joiner reports a non-zero startup_seqno.
CHANGELOG: new [Unreleased] / Fixed entry.

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/d3383e6c-3063-4f30-ba6d-e324856cc41f Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/ee92835a-b9f3-4451-aa71-bfc04a2d774e Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Copilot

Pull request overview

This PR hardens the node join protocol to prevent snapshot-less joiners from being accepted by an “original” primary that has already committed snapshots, avoiding full-ledger replay by requiring a sufficiently recent startup snapshot on the initial join attempt.

Changes:

Add retry_count to the node-to-node join request and track it on the joiner to distinguish first-attempt joins from retries after snapshot fetch.
Update primary-side join acceptance logic to compare the joiner’s startup_seqno against the latest committed snapshot on disk (first attempt only), falling back to the previous check on retries.
Add an E2E regression test covering rejection of snapshot-less joins and successful join after fetching a snapshot; add a changelog entry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/node/rpc/node_call_types.h`	Adds optional `retry_count` to join request type.
`src/node/rpc/serialization.h`	Adds `retry_count` to JSON serialization for join request.
`src/node/node_state.h`	Tracks and sends `join_retry_count` when retrying join after snapshot fetch.
`src/node/rpc/node_frontend.h`	Implements primary-side min acceptable seqno check using latest on-disk committed snapshot on first attempt.
`tests/reconfiguration.py`	Adds regression test for joining without a snapshot against an original primary holding committed snapshots.
`CHANGELOG.md`	Adds `[Unreleased]` fixed entry describing the behavior change.

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/77ef7836-6f94-4a76-a8fd-ec303efa9014 Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot · 2026-04-28T12:29:49Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh gh pr view --json number (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/0152a93d-bea4-4acf-8376-7be729e36da8 Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/a18950f8-94db-4a15-b9a5-cae39445e222 Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

…napshot-joining-behaviour

cjen1-msft · 2026-05-05T12:11:09Z

Was blocked on #7853 now unblocked.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

achamayou · 2026-05-05T19:54:44Z

@cjen1-msft the new test does not seem to be passing:

RuntimeError: ["Failure in reconfiguration: RuntimeError('Node 1 joined a primary which holds a committed snapshot without fetching one - this would replay the entire ledger')"]

Co-authored-by: Amaury Chamayou <amaury@xargs.fr>

…r is legacy or not going to fetch.

Initial plan

2d9df82

Copilot AI assigned Copilot and cjen1-msft Apr 28, 2026

Copilot started work on behalf of cjen1-msft April 28, 2026 10:58 View session

Copilot AI linked an issue Apr 28, 2026 that may be closed by this pull request

Join of a snapshot-less node to an original node replays the ledger #7835

Open

Implement primary-side join check against latest on-disk snapshot

043687d

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/d3383e6c-3063-4f30-ba6d-e324856cc41f Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement behavior for joining nodes to original ledger~~ Reject snapshot-less join against primary holding a more recent snapshot Apr 28, 2026

Copilot finished work on behalf of cjen1-msft April 28, 2026 11:22

Copilot AI requested a review from cjen1-msft April 28, 2026 11:22

cjen1-msft reviewed Apr 28, 2026

View reviewed changes

Comment thread src/node/rpc/node_call_types.h Outdated

cjen1-msft reviewed Apr 28, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

Copilot started work on behalf of cjen1-msft April 28, 2026 12:10 View session

Shorten retry_count doc comment per review

ef73def

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/ee92835a-b9f3-4451-aa71-bfc04a2d774e Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Copilot finished work on behalf of cjen1-msft April 28, 2026 12:14

Copilot AI requested a review from cjen1-msft April 28, 2026 12:14

cjen1-msft marked this pull request as ready for review April 28, 2026 12:21

cjen1-msft requested a review from a team as a code owner April 28, 2026 12:21

Copilot AI review requested due to automatic review settings April 28, 2026 12:21

Copilot started reviewing on behalf of cjen1-msft April 28, 2026 12:22 View session

eddyashton mentioned this pull request Apr 28, 2026

Eliminate get_committed_snapshots from join path. #7832

Draft

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread src/node/rpc/node_frontend.h Outdated

eddyashton reviewed Apr 28, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Copilot started work on behalf of eddyashton April 28, 2026 12:27 View session

CHANGELOG: rename Unreleased to 7.0.3, bump pyproject and add PR ref

8e9c582

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/77ef7836-6f94-4a76-a8fd-ec303efa9014 Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot AI requested a review from eddyashton April 28, 2026 12:29

Copilot finished work on behalf of eddyashton April 28, 2026 12:29

Copilot started work on behalf of cjen1-msft April 28, 2026 12:30 View session

CHANGELOG: shorten 7.0.3 entry

47de9f8

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/0152a93d-bea4-4acf-8376-7be729e36da8 Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Copilot finished work on behalf of cjen1-msft April 28, 2026 12:31

Copilot started work on behalf of cjen1-msft April 28, 2026 13:49 View session

node_frontend: treat missing retry_count as legacy joiner (value_or(1))

f3d26b8

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/a18950f8-94db-4a15-b9a5-cae39445e222 Co-authored-by: cjen1-msft <190360281+cjen1-msft@users.noreply.github.com>

Copilot finished work on behalf of cjen1-msft April 28, 2026 13:52

achamayou reviewed Apr 28, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou reviewed Apr 28, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou approved these changes Apr 28, 2026

View reviewed changes

cjen1-msft marked this pull request as draft May 1, 2026 09:17

Merge remote-tracking branch 'upstream/main' into copilot/implement-s…

c6ab243

…napshot-joining-behaviour

cjen1-msft added 3 commits May 5, 2026 16:59

Fixup run

06cb72e

Fmt

13066f0

If fetching is disabled, set fetch_count = std::nullopt

597f7a6

cjen1-msft added the run-long-test Run Long Test job label May 5, 2026

Merge branch 'main' into copilot/implement-snapshot-joining-behaviour

bf96c65

cjen1-msft marked this pull request as ready for review May 5, 2026 17:47

cjen1-msft requested a review from Copilot May 5, 2026 17:48

Copilot AI reviewed May 5, 2026

View reviewed changes

Copilot started reviewing on behalf of cjen1-msft May 5, 2026 18:30 View session

cjen1-msft marked this pull request as draft May 5, 2026 19:29

achamayou reviewed May 5, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou reviewed May 5, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou reviewed May 5, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou reviewed May 5, 2026

View reviewed changes

Comment thread src/node/rpc/node_frontend.h Outdated

achamayou reviewed May 5, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

cjen1-msft and others added 5 commits May 5, 2026 21:37

Apply suggestions from code review

be211e1

Co-authored-by: Amaury Chamayou <amaury@xargs.fr>

Remove failing case, now that we use startup seqno for when the joine…

4a9e61b

…r is legacy or not going to fetch.

snags

13dff1d

fmt

b24d2a2

Merge branch 'main' into copilot/implement-snapshot-joining-behaviour

ec5a1d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject snapshot-less join against primary holding a more recent snapshot#7844

Reject snapshot-less join against primary holding a more recent snapshot#7844
Copilot wants to merge 17 commits intomainfrom
copilot/implement-snapshot-joining-behaviour

Copilot AI commented Apr 28, 2026 •

edited by cjen1-msft

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Apr 28, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Uh oh!

cjen1-msft commented May 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

achamayou commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Copilot AI commented Apr 28, 2026 • edited by cjen1-msft Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copilot generated description:

Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Apr 28, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Uh oh!

cjen1-msft commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

achamayou commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Apr 28, 2026 •

edited by cjen1-msft

Loading

cjen1-msft commented May 5, 2026 •

edited

Loading