Summary
The reserve worker only decreases storageRadius when SyncRate() == 0, which is unrealistic on active networks where pullsync never fully stops due to continuous uploads. Combined with the 15-minute check interval and single-step decrements, this causes nodes to report a stale (too high) storageRadius even when their reserve already contains enough chunks to justify a lower radius.
Code Reference
pkg/storer/reserve.go, reserveWorker():
case <-thresholdTicker.C:
radius := db.reserve.Radius()
count, err := db.countWithinRadius(ctx)
// ...
if count < threshold(db.reserve.Capacity()) && db.syncer.SyncRate() == 0 && radius > db.reserveOptions.minimumRadius {
radius--
// ...
}
Three constraints compound the problem:
-
SyncRate() == 0 gate — On an active network, pullsync never fully stops. New data is continuously being uploaded, so there's always some sync activity. This condition blocks radius adjustment indefinitely.
-
15-minute ticker (reserveWakeUpDuration = 15 * time.Minute) — Even when conditions are met, the radius can only decrease by 1 every 15 minutes.
-
Single-step decrement (radius--) — If the radius needs to drop by several levels (e.g., after a restart with batchstore reset), adaptation takes N * 15 minutes.
Observable Impact
After a node restart with batchstore resync, the radius starts high and needs to decrease to match the node's actual reserve capacity. Example:
- Node has ~115M chunks in reserve (near full capacity for its configuration)
storageRadius is stuck at 5 instead of the correct value of 4
pullsyncRate is ~8 chunks/sec (normal background sync activity)
- Radius never decreases because
SyncRate() != 0
This causes:
- Incorrect
committedDepth reported to peers via the status protocol, which feeds into salud's network radius consensus
- Node marked unhealthy by salud due to
committedDepth mismatch (self_radius vs network_radius)
- Potential redistribution game issues — node plays with wrong radius, risking incorrect samples
Nodes with reserve-capacity-doubling Are More Affected
Doubled nodes cover more neighborhoods and receive proportionally more chunk offers via pullsync. Their SyncRate() is structurally higher than non-doubled nodes, making the == 0 condition even harder to satisfy.
Suggestion
The radius adjustment could be based on the relationship between reserveSize and capacity rather than requiring zero sync activity. For example:
- Remove or relax the
SyncRate() == 0 gate — perhaps use a threshold relative to the node's capacity or check that sync rate is stable rather than zero
- Allow multi-step radius adjustment — if the reserve size clearly justifies a lower radius, skip intermediate steps
- Reduce the check interval — 15 minutes is very coarse for a value that affects network consensus
The radius increase path (in unreserve()) already operates without a sync rate gate, so there's precedent for the radius responding to actual reserve state rather than sync activity.
Summary
The reserve worker only decreases
storageRadiuswhenSyncRate() == 0, which is unrealistic on active networks where pullsync never fully stops due to continuous uploads. Combined with the 15-minute check interval and single-step decrements, this causes nodes to report a stale (too high) storageRadius even when their reserve already contains enough chunks to justify a lower radius.Code Reference
pkg/storer/reserve.go,reserveWorker():Three constraints compound the problem:
SyncRate() == 0gate — On an active network, pullsync never fully stops. New data is continuously being uploaded, so there's always some sync activity. This condition blocks radius adjustment indefinitely.15-minute ticker (
reserveWakeUpDuration = 15 * time.Minute) — Even when conditions are met, the radius can only decrease by 1 every 15 minutes.Single-step decrement (
radius--) — If the radius needs to drop by several levels (e.g., after a restart with batchstore reset), adaptation takesN * 15 minutes.Observable Impact
After a node restart with batchstore resync, the radius starts high and needs to decrease to match the node's actual reserve capacity. Example:
storageRadiusis stuck at 5 instead of the correct value of 4pullsyncRateis ~8 chunks/sec (normal background sync activity)SyncRate() != 0This causes:
committedDepthreported to peers via the status protocol, which feeds into salud's network radius consensuscommittedDepthmismatch (self_radiusvsnetwork_radius)Nodes with
reserve-capacity-doublingAre More AffectedDoubled nodes cover more neighborhoods and receive proportionally more chunk offers via pullsync. Their
SyncRate()is structurally higher than non-doubled nodes, making the== 0condition even harder to satisfy.Suggestion
The radius adjustment could be based on the relationship between
reserveSizeandcapacityrather than requiring zero sync activity. For example:SyncRate() == 0gate — perhaps use a threshold relative to the node's capacity or check that sync rate is stable rather than zeroThe radius increase path (in
unreserve()) already operates without a sync rate gate, so there's precedent for the radius responding to actual reserve state rather than sync activity.