Open spec questions for DISC-NG service discovery
Tracking three related questions surfaced during DISC-NG implementation work on datahop/go-ethereum. Each affects implementation behaviour that the current discv5-theory.md doesn't fully pin down. Filing as a single issue with three sections so the discussions can resolve independently.
1. Single B(s) per node vs separate advertise/search tables
The Service Tables section reads:
"Advertisers use B(s) as an advertise table for service s; discoverers use B(s) as a search table for service s."
This implies one service table per node per service.
In the datahop/go-ethereum DISC-NG implementation, however, a single node maintains two separate B(s) instances per topic — Registration.buckets (when advertising) and Search.buckets (when discovering). They're not strictly synchronised, although they converge to similar contents in practice (filtered ordinary discovery + DISC-NG auxiliary ENRs).
Question: which way does the spec intend?
- (A) Mandate a single shared
B(s) per node per service. Simpler model, aux ENRs from either role benefit both, but requires refactor in implementations that have already split them.
- (B) Explicitly allow either as an implementation choice. Doesn't disrupt existing implementations. Slightly weakens the model: the
topic-distance hint sent in REGTOPIC vs TOPICQUERY can diverge for the same local node depending on which table currently has free space at which distance.
- (C) Mandate single, with a transitional period where separate tables remain acceptable.
Choice affects implementation contracts and how the topic-distance hint is expected to behave.
2. Does the ad cache need an eviction policy?
The Ad Cache section states the cache has capacity C but doesn't say what happens when admission would push it over.
Reading the Waiting-Time Function carefully: its 1 / (1 - c/C)^Pocc term diverges as c → C, so the waiting time required for the next admission grows without bound before the cache is fully populated. Capacity C is therefore an asymptotic limit — the waiting-time mechanism IS the admission control — and no eviction policy is needed for correctness.
Question: should the spec say this explicitly?
Suggested wording for the Ad Cache section:
The ad cache does not require an eviction policy. The waiting-time function (see Waiting-Time Function) acts as admission control: as the cache fills, the 1 / (1 - c/C)^Pocc term grows without bound, so the waiting time required for the next admission diverges before the cache is fully populated. Cache capacity C is therefore an asymptotic limit, not a hard cap reached during normal operation. Implementations MAY defensively reject admissions that would otherwise exceed C (e.g. to guard against numerical edge cases), but they are not required to evict existing entries.
Useful because implementers reading the Ad Cache section in isolation might otherwise design an eviction policy the protocol doesn't actually need. Or is this implicit enough to leave alone?
3. Lookup termination semantics + within-bucket selection ordering
The Lookup Procedure section says:
"the discoverer queries up to Klookup registrars per bucket and stops when it has collected at least Flookup distinct advertisers or when no unqueried registrars remain."
Two related areas the text doesn't fully pin down.
3a. Where is termination enforced — protocol or interface?
Read normatively, the text says the protocol stops the lookup once Flookup is reached. The go-ethereum DISC-NG implementation, however, exposes lookup as an enode.Iterator that never naturally terminates — it loops indefinitely creating fresh Search instances, and the caller has to externally enforce Flookup. (See datahop/go-ethereum#60 — the original IsDone deadlock had a queriesWithoutNewNodes >= 4 heuristic that didn't act as Flookup either.)
Question: should the spec require the protocol-level state machine to honour Flookup internally, or explicitly leave termination to the caller (and recast Flookup as a recommended caller-side cap)?
3b. Within-bucket selection ordering and frontier behaviour
The text doesn't pin down:
- Selection order within a bucket — random or sequential
- Empty unqueried bucket — does it halt descent (warm-up frontier reading) or skip it and continue toward
s?
- Refilled-after-empty — when an
AddNodes mid-lookup repopulates an earlier bucket, does the lookup return to it on the next query?
The go-ethereum implementation has had concrete bugs here (datahop/go-ethereum#65 — "empty unqueried bucket blocks descent"), and PR ethereum#60 picks a specific set of answers: skip empty buckets, warm-up frontier applies only to populated unqueried buckets, refilled-earlier-bucket takes priority on the next query selection.
Question: are these the answers the theory intends? Pinning them down explicitly would prevent other implementations re-deriving them.
Cross-references
Open spec questions for DISC-NG service discovery
Tracking three related questions surfaced during DISC-NG implementation work on
datahop/go-ethereum. Each affects implementation behaviour that the currentdiscv5-theory.mddoesn't fully pin down. Filing as a single issue with three sections so the discussions can resolve independently.1. Single
B(s)per node vs separate advertise/search tablesThe Service Tables section reads:
This implies one service table per node per service.
In the
datahop/go-ethereumDISC-NG implementation, however, a single node maintains two separateB(s)instances per topic —Registration.buckets(when advertising) andSearch.buckets(when discovering). They're not strictly synchronised, although they converge to similar contents in practice (filtered ordinary discovery + DISC-NG auxiliary ENRs).Question: which way does the spec intend?
B(s)per node per service. Simpler model, aux ENRs from either role benefit both, but requires refactor in implementations that have already split them.topic-distancehint sent in REGTOPIC vs TOPICQUERY can diverge for the same local node depending on which table currently has free space at which distance.Choice affects implementation contracts and how the
topic-distancehint is expected to behave.2. Does the ad cache need an eviction policy?
The Ad Cache section states the cache has capacity
Cbut doesn't say what happens when admission would push it over.Reading the Waiting-Time Function carefully: its
1 / (1 - c/C)^Poccterm diverges asc → C, so the waiting time required for the next admission grows without bound before the cache is fully populated. CapacityCis therefore an asymptotic limit — the waiting-time mechanism IS the admission control — and no eviction policy is needed for correctness.Question: should the spec say this explicitly?
Suggested wording for the Ad Cache section:
Useful because implementers reading the Ad Cache section in isolation might otherwise design an eviction policy the protocol doesn't actually need. Or is this implicit enough to leave alone?
3. Lookup termination semantics + within-bucket selection ordering
The Lookup Procedure section says:
Two related areas the text doesn't fully pin down.
3a. Where is termination enforced — protocol or interface?
Read normatively, the text says the protocol stops the lookup once
Flookupis reached. Thego-ethereumDISC-NG implementation, however, exposes lookup as anenode.Iteratorthat never naturally terminates — it loops indefinitely creating freshSearchinstances, and the caller has to externally enforceFlookup. (See datahop/go-ethereum#60 — the original IsDone deadlock had aqueriesWithoutNewNodes >= 4heuristic that didn't act asFlookupeither.)Question: should the spec require the protocol-level state machine to honour
Flookupinternally, or explicitly leave termination to the caller (and recastFlookupas a recommended caller-side cap)?3b. Within-bucket selection ordering and frontier behaviour
The text doesn't pin down:
s?AddNodesmid-lookup repopulates an earlier bucket, does the lookup return to it on the next query?The
go-ethereumimplementation has had concrete bugs here (datahop/go-ethereum#65 — "empty unqueried bucket blocks descent"), and PR ethereum#60 picks a specific set of answers: skip empty buckets, warm-up frontier applies only to populated unqueried buckets, refilled-earlier-bucket takes priority on the next query selection.Question: are these the answers the theory intends? Pinning them down explicitly would prevent other implementations re-deriving them.
Cross-references
topic-distancefield and TOPICNODES message: discv5-wire: align REGTOPIC, REGCONFIRMATION, TOPICQUERY with DISC-NG; add TOPICNODES #2