VC: Reconnect head SSE streams and restart on candidate updates#9018
VC: Reconnect head SSE streams and restart on candidate updates#9018peter941221 wants to merge 1 commit intosigp:unstablefrom
Conversation
c7fe38a to
224bb79
Compare
067c051 to
bf77376
Compare
|
Quick follow-up here because linked issue The current head is still
If maintainers would prefer an even smaller first slice, I am happy to split it, but my read was that the reconnect and candidate-update restart paths are the same stale-stream consistency bug family from |
|
Quick follow-up here because linked issue The current head is still
If maintainers would prefer a smaller first merge shape, I am happy to split this further. My current read was just that the reconnect path and the candidate-update restart path are both part of the same stale-stream consistency problem from If there is a preferred smaller slice or a different direction you would rather take here, I am very happy to reshape the branch. |
PR: Fix head monitor resilience + candidate-update restart (
sigp/lighthouse#8741)Summary
futures::stream::select_all.select_alldrops it silently and it never reconnects.update_candidates_listre-enumerates candidate indices; any long-lived stream may keep reporting stale indices unless the head monitor restarts.What This PR Changes
Per-BN self-reconnecting head event stream:
Streambuilt withfutures::stream::unfold.Err(_)orNone(stream ended) triggers:BeaconHeadCache::remove(candidate_index)to avoid stale cache influencetokio::time::timeoutaroundBeaconNodeHttpClient::get_eventsto avoid a hung connect blocking progress.None) are logged atdebugto avoid false-alarm warnings (warns are reserved for connect timeouts/errors and stream errors).Candidate list update restart signal ("generation"):
BeaconNodeFallbacknow maintainshead_monitor_generation_tx: watch::Sender<u64>.update_candidates_list(...)bumps the generation (usingsend_replace) after updating candidates + purging the head cache.poll_head_event_from_beacon_nodes(...)subscribes and returnsOk(())on generation change, causing the outer service loop to restart the monitor with the new candidate indices.infolog on generation change so operators can see deliberate monitor restarts.Files Touched
validator_client/beacon_node_fallback/src/lib.rshead_monitor_generation_txset_head_sendupdate_candidates_listvalidator_client/beacon_node_fallback/src/beacon_head_monitor.rsBeaconHeadCache::removeremoveonly removes requested entryTesting
cargo test -p beacon_node_fallbackKnown Limitations / Follow-ups