Skip to content

Commit a5b5294

Browse files
committed
fix(swip-25): rename, add algorithm and missing definitions
Adds missing glossary definitions, fixes incomplete storage depth definition, and improves specification clarity. Addresses PR ethersphere#66 feedback.
1 parent e352618 commit a5b5294

1 file changed

Lines changed: 30 additions & 12 deletions

File tree

SWIPs/swip-pullsync.md

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
SWIP: 25
3-
title: More efficient pull syncing within neighbourhood
4-
author: Viktor Tron <@zelig>, Viktor Tóth <@nugaon>
3+
title: Pull-sync from closest peer
4+
author: Viktor Tron <@zelig>, Viktor Tóth <@nugaon>, Marios Isaakidis <@misaakidis>
55
discussions-to: https://discord.com/channels/799027393297514537/1239813439136993280
66
status: Draft
77
type: <Standards Track (Core)>
@@ -18,13 +18,15 @@ This SWIP describes a more efficient way to synchronise content between peers wi
1818

1919
- **Reserve**: the set of chunks stored in the network (pushed to the network with a valid postage stamp).
2020
- **Proximity Order (PO)**: measure of proximity of two addresses, calculated as the number of matching leading bits that are common to their big-endian binary representation.
21-
- **Storage depth**: Smallest integer $D$ such that $2^D$ neighbourhoods of depth $D$ (holding a disjoint replication sets of all their bins X, s.t. $X \geq D$ in each neighbourhood) is able to accommodate the network reserve. Assuming uniform utilisation across neighborhoods, and a node reserve depth of $t$, $D_s := \lceil \mathit{log}_2(N) \rceil - t$.
21+
- **Storage depth**: Smallest integer $D$ such that $2^D$ neighbourhoods of depth $D$ (where each neighbourhood holds a disjoint replication set of all bins X such that $X \geq D$) can collectively accommodate the network reserve. Assuming uniform utilisation across neighborhoods, and a node reserve depth of $t$, $D_s := \lceil \mathit{log}_2(N) \rceil - t$.
2222
- **M's Neighbourhood of depth D** An address range, elements of which share at least $D$ bits with $M$:
2323
$\lbrace c \in \mathrm{Chunks}\mid \mathit{PO}(\mathit{Addr}(c),\mathit{Addr}(M)) \geq D\rbrace$.
2424
Alternatively, the chunks in $M$'s neighbourhood of depth $D$ can also be expressed as the union of all $M$'s bins at and beyond $D$,
2525
$\lbrace c\in\mathrm{Chunks}\mid \mathrm{NH}_D(\mathit{Addr}(M))\rbrace$ = $\bigcup_{x\geq D} \mathrm{bin}_X(M)$.
2626
- **Bin X of M**: Bin $x$ of a node $M$ contains all the chunks in the network reserve the PO of which with M equals $X$: $\mathrm{Bin}_X(M) := \lbrace c\in\mathrm{Reserve}\mid\mathit{PO}(\mathit{Addr}(c), \mathit{Addr}(M)) = X\rbrace$.
2727
- **BinID**: A sequential number assigned locally by each node to each chunk within a bin, used to order chunks and track sync progress.
28+
- **Uniqueness Depth**: For a peer $p$ in a set of neighborhood peers, the uniqueness depth is the proximity order at which $p$ becomes the only peer in the set sharing that prefix. It is the depth in the compacted binary trie at which the peer's address becomes unique among the neighborhood peers.
29+
- **Closest Peer**: For a given chunk, the peer with the highest proximity order to that chunk's address among all available neighborhood peers.
2830
- **Pull-sync**: The protocol for syncing all the chunks that all nodes within a neighbourhood need to store in their reserve. The protocol itself is well established and shall not change.
2931
- **Pivot**: Strategies of pull-syncing involve the perspective of a particular node, the **pivot node**, and concern the algorithm that dictates which particular address bins and binID ranges the pivot should be requesting from their peers.
3032

@@ -36,25 +38,41 @@ If a node is connected to swarm as a full node, it fires up the pullsync protoco
3638
<!--The motivation is critical for SWIPs that want to change the Swarm protocol. It should clearly explain why the existing protocol specification is inadequate to address the problem that the SWIP solves. SWIP submissions without sufficient motivation may be rejected outright.-->
3739
As described in the Book of Swarm (2.3.3), the current pull-syncing implementation results in chunks being synced from multiple upstream peers, leading to bandwidth waste because of duplicate fetching.
3840

39-
Imagine, that a naive peer joins a neighbourhood, then they will 'subscribe to' each
40-
depth of their peers within the neighbourhood. As they are receiving new chunks of course these are offering it too back to the peer they got it from. Plus they try to synchronise from each peer the entire reserve, not just part, which means a naive node's synchronisation involves exchange of `N*S` chunk hashes where N is the neighbourhood size and S is the size of the reserve. This is hugely inefficient.
41+
When a naive peer joins a neighbourhood, it will 'subscribe to' each depth of its peers within the neighbourhood. As it receives new chunks, it offers them back to the peer it got them from. Additionally, it tries to synchronise from each peer the entire reserve, not just part, which means a naive node's synchronisation involves exchange of `N*S` chunk hashes where N is the neighbourhood size and S is the size of the reserve. This is hugely inefficient.
4142

4243
## Specification
4344
<!--The technical specification should describe the syntax and semantics of any new feature. The specification should be detailed enough to allow competing, interoperable implementations for the current Swarm platform and future client implementations.-->
44-
Each peer `P` takes all their peers they are allowed to synchronise with: `p_0, p_1, ..., p_n`.
45+
Each peer `P` identifies all peers it is allowed to synchronise with: `p_0, p_1, ..., p_n`.
4546
All chunks need to be synchronized only once.
4647
Each chunk is synchronized from its closest peer among the neighborhood peers.
4748

48-
Once synchronization from all assigned peers is complete, the node's reserve for any depth equal to or higher than storage radius will match the network reserve.
49+
### Algorithm
50+
1. Build a compacted binary trie from neighborhood peer addresses, starting at the storage depth
51+
2. For each peer, determine its uniqueness depth (the depth at which it becomes unique in the trie)
52+
3. Assign bins to peers based on the trie structure:
53+
- For each leaf node (representing a peer at its uniqueness depth), sync all bins at or above that peer's uniqueness depth
54+
- For compactible nodes (nodes with one child), sync the corresponding bin from any peer in the accumulated set
55+
4. Additionally, sync the PO(p,P) bin from each peer p to ensure completeness, where PO(p,P) is the proximity order between peer p and the pivot P
56+
57+
Once synchronization from all assigned peers is complete, the node's reserve for any depth at or above the storage depth will match the network reserve.
4958

5059
Unlike the earlier algorithm, this one is extremely sensitive to the changing peerset, so every single time there is a change in the neighbours, pullsync strategy needs to be reevaluated.
5160

5261
## Rationale
5362
<!--The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->
5463

55-
One can see that each chunk is taken from its most immediate neighbourhood only. So depending on to what extent the peer addresses are balanced we save a lot on not taking anything more than once. Consider a peer with neighbourhood depth `d`, and two neighbours in the neighbourhood each having a common 2 bit prefix. Their levels in the tree are `d+3` for each peer, and we synchronise chunks closest to them on their `Bin d+3`, `Bin d+4`, `Bin d+5`, etc. The peers share the same parent tree node on level `d+2` therefore their `Bin d+2` chunks are equidistant from both peers and can be synchronised from either peer. `Bin d` and `Bin d+1` should contain the same chunks for both peers so each bin can be synchronised with one peer only.
64+
One can see that each chunk is taken from its most immediate neighbourhood only. So depending on to what extent the peer addresses are balanced we save a lot on not taking anything more than once.
65+
66+
Consider a peer with neighbourhood depth `d` and two neighbours in the neighbourhood, each having a common 2-bit prefix:
67+
- Their uniqueness depths in the trie are `d+3` for each peer
68+
- We synchronise chunks closest to them on their `Bin d+3`, `Bin d+4`, `Bin d+5`, etc.
69+
- The peers share the same parent tree node on level `d+2`, therefore their `Bin d+2` chunks are equidistant from both peers and can be synchronised from either peer
70+
- `Bin d` and `Bin d+1` contain the same chunks for both peers, so each bin can be synchronised with one peer only
71+
5672
This means the synchronisation is halved for the first 2 levels compared to the current process in this setting.
5773

74+
To ensure completeness, syncing bins at or above each peer's uniqueness depth covers all chunks where that peer is closest. Additionally, syncing the PO(p,P) bin from each peer p ensures we get all chunks where the pivot P is closer to the chunk than peer p. This way all chunks in the reserve are synchronized exactly once.
75+
5876
One potential caveat is that if a peer quits or is no longer contactable before the pivot finished syncing with them, then synchronization for the affected bins must be restarted with an alternative peer.
5977

6078
## Backwards Compatibility
@@ -69,9 +87,9 @@ Thorough testing is needed to ensure correctness, as this change can affect loca
6987

7088
## Implementation
7189
<!--The implementations must be completed before any SWIP is given status "Final", but it need not be completed before the SWIP is accepted. While there is merit to the approach of reaching consensus on the specification and rationale before writing code, the principle of "rough consensus and running code" is still useful when it comes to resolving many discussions of API details.-->
72-
In order to find out what nodes share common chunk sets and what are unique ones, a leaf compacted binary tree of addresses from neighborhood peers can be made. The depth of any path extends only as far as is necessary to separate one group of addresses from another.
90+
To determine which peers share common chunk sets and which are unique, a compacted binary tree of addresses from neighborhood peers is constructed. A compacted binary tree is one where nodes with a single child are merged with their parent, reducing the tree depth. The depth of any path extends only as far as is necessary to separate one group of addresses from another.
7391
In this structure, every tree node represents a prefix and each step in the binary tree reflects a further position within the binary representation of the addresses and increments the `level` by 1.
74-
Since the bins must be synchronised only above or equal to storage radius, the root node should represent the common prefix of the neighborhood and initialize the `level` with storage radius.
92+
Since the bins must be synchronised only at or above the storage depth, the root node should represent the common prefix of the neighborhood and initialize the `level` with storage depth.
7593

7694
Each leaf holds a particular peer $p$ and its `level` is $p$'s uniqueness depth. Consequently, each chunk sharing the prefix represented by the leaf is closest to $p$.
7795
Each compactible node (i.e., a node with one child) indicates that all chunks on the missing branch have no single closest peer and are equidistant from two or more peers on the existing branch.
@@ -80,11 +98,11 @@ Ideally, to sync all the chunks we need to cover all the branches of the trie:
8098
- all chunks whose addresses match the prefix represented by a leaf node must be synchronized from the peer stored in that leaf.
8199
- all chunks on the missing branch of a compactible node must be synced from a peer on the existing branch.
82100

83-
This is achieved if we traverse the trie in a depth-first manner and for each leaf node we subscribe to all bins greater or equal to its `level`. Then we accumulate peers at the intermediate nodes. While doing this, compactible nodes of level `X` we sync `bin X` from a peer from the accumulated set.
101+
This is achieved by traversing the trie in a depth-first manner. For each leaf node, we subscribe to all bins greater than or equal to its `level`. We accumulate peers at the intermediate nodes. For compactible nodes of level `X`, we sync `bin X` from a peer from the accumulated set.
84102

85103
Note that those tree nodes that have two children of the trie represent prefixes that are fully covered by one of the peers below.
86104

87-
The assumption behind the loose specification is that we do not need to support for any kind of pull-sync change and existing data flow will be sufficient. In particular, the following assumptions are made:
105+
We assume that no changes to the pull-sync protocol are needed and that the existing data flow will be sufficient. In particular, the following assumptions are made:
88106
- pullsync primary indexes the chunks by PO (relative to the node address)
89107
- secondary ordering within a bin is based on the first time of storage.
90108
- the chronology makes it possible to have live (during session) and historical syncing.

0 commit comments

Comments
 (0)