Skip to content

fix: emit UpdateEvent::BroadcastEmitted telemetry events#3643

Merged
sanity merged 7 commits intofreenet:mainfrom
majiayu000:fix/issue-3622-emit-broadcast-emitted-telemetry
Apr 12, 2026
Merged

fix: emit UpdateEvent::BroadcastEmitted telemetry events#3643
sanity merged 7 commits intofreenet:mainfrom
majiayu000:fix/issue-3622-emit-broadcast-emitted-telemetry

Conversation

@majiayu000
Copy link
Copy Markdown
Contributor

Problem

The UpdateEvent::BroadcastEmitted telemetry event type is fully defined with JSON serialization, state verifier support, and WebSocket streaming, but is never actually emitted during broadcast operations. The old pathway via UpdateMsg::Broadcasting is deprecated.

Solution

  1. Added NetEventLog::update_broadcast_emitted() constructor in tracing.rs, following the same pattern as update_broadcast_received and update_broadcast_applied. Populates upstream/sender from own location, computes state_hash via state_hash_short().

  2. Emit in simulation path: In broadcast_state_to_peers(), after the per-target send loop and before the existing BroadcastDeliverySummary emission. Uses the actual send_success count for broadcasted_to.

  3. Emit in production path: In handle_broadcast_state_change(), inside the #[cfg(not(feature = "simulation_tests"))] block after the enqueue loop. Uses targets.len() as best-effort broadcasted_to since actual delivery is async via BroadcastQueue.

Testing

  • cargo check — compiles cleanly
  • cargo fmt --check — no formatting issues
  • cargo clippy --locked -- -D warnings — no warnings
  • cargo test -p freenet --lib — all 2008 tests pass
  • Existing StateVerifier tests already validate BroadcastEmitted event processing
  • Simulation tests exercise the broadcast path and will now emit these events

Fixes

Closes #3622

The BroadcastEmitted event type was fully defined with serialization
support but never emitted. Add NetEventLog::update_broadcast_emitted()
constructor and emit it in both simulation (broadcast_state_to_peers)
and production (handle_broadcast_state_change) broadcast paths.

Fixes freenet#3622

Signed-off-by: majiayu000 <1835304752@qq.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Rule Review: fix: PR missing new #[test] function for regression

Rules checked: git-workflow.md, code-style.md, testing.md
Files reviewed: 4 (p2p_protoc.rs, testing_impl.rs, tracing.rs, simulation_integration.rs)

Warnings

  • crates/core/tests/simulation_integration.rs:3420fix: PR adds assertions to the existing test_subscription_broadcast_propagation function rather than adding a new #[test] function dedicated to this regression. testing.md is explicit: "CI will reject fix: PRs that don't add at least one new #[test] function." The assertions verify broadcast_emitted_count > 0 but live inside an existing test that was not written to target this specific bug path. A new function (e.g., test_broadcast_emitted_telemetry_event_is_emitted) would satisfy the CI rule and isolate the regression scenario. (rule: testing.md — "Bug fixes (fix: PRs) MUST include regression tests … CI will reject fix: PRs that don't add at least one new #[test] function")

Info

  • crates/core/tests/simulation_integration.rs:3444 — The assertion only checks broadcast_emitted_count > 0. It does not verify the event fields (e.g., broadcasted_to count matches actual targets, key is correct, state_hash is present). Sufficient for catching the absence regression, but a more targeted test would also guard against silent field corruption in future refactors. (rule: testing.md — edge-case coverage)

Rule review against .claude/rules/. WARNING findings block merge. ⚠️ 1 warning(s) — fix or add review-override label

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Collaborator

@sanity sanity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive PR Review: #3643

Summary

  • PR Title: fix: emit UpdateEvent::BroadcastEmitted telemetry events
  • Type: fix (telemetry wiring)
  • CI Status: ✅ 2/2 passing
  • Linked Issues: #3622

Code-First Analysis

Independent Understanding: Adds a new NetEventLog::update_broadcast_emitted() constructor and emits UpdateEvent::BroadcastEmitted telemetry events in two broadcast code paths (production and simulation). The event type was fully defined but never emitted.

Stated Intent: Matches implementation. The PR correctly wires up the pre-existing but never-fired event.

Alignment: Mostly aligned, but the production path has accuracy issues that partially undermine the telemetry value.


Testing Assessment

Coverage Level: Insufficient for an untrusted contributor

Test Type Status Notes
Unit ⚠️ No unit test for update_broadcast_emitted() constructor
Integration ⚠️ Simulation tests exercise the path but don't assert event emission
Simulation N/A Existing StateVerifier tests validate event processing but not emission

Missing Tests: No test verifies that BroadcastEmitted events are actually produced. The testing_impl.rs summary currently ignores this event variant (no-op match arm at line 2880). A future refactor could silently break emission with no test catching it.


Skeptical Findings

Risk Level: Medium (misleading telemetry, not behavioral)

Concern Severity Location Details
Fabricated Transaction ID High p2p_protoc.rs:~4263 (production path) Transaction::new() creates an orphaned ID uncorrelatable with the actual update operation. Downstream telemetry consumers joining on transaction ID will get no matches. The simulation path correctly reuses update_tx.
Misleading broadcasted_to Medium p2p_protoc.rs:~4269 (production path) Uses target_result.targets.len() (enqueued count), not actual send success count. Will always report 100% delivery success even when sends fail, misleading broadcast failure debugging.
target_result.targets.clone() in hot path Low p2p_protoc.rs:~4267 (production path) Clones up to 512 PeerKeyLocation elements (~50KB) purely for telemetry on every broadcast. Consider recording only the count or a sample.
.clone() on new_state Low p2p_protoc.rs:~4270 (production path) Not needed — new_state has no further uses in non-simulation builds. Could be a move. (Note: WrappedState is Arc-based, so clone is just a refcount bump — cheap.)

Verified safe:

  • No duplicate emission with deprecated UpdateMsg::Broadcasting path (different trigger conditions)
  • Transaction::new() uses ULID — no collision risk with real transactions in state machine maps
  • TelemetryReporter::send_event uses try_send (non-blocking); EventRegister uses bounded .send().await but this is a pre-existing pattern
  • WrappedState::clone() is Arc refcount bump, not deep copy

Big Picture Assessment

Goal Alignment: Partial — addresses #3622 but production telemetry quality is poor
Anti-Patterns Detected: Fabricated transaction ID
Removed Code Concerns: None (purely additive: +61/-0)
Scope Assessment: Focused, no creep


Recommendations

Must Fix (Blocking)

  1. Document the fabricated transaction ID limitation. The current comment says "Best-effort: broadcasted_to = target count since actual delivery is async" but doesn't mention the uncorrelatable transaction ID. Add a clear comment that the production path's transaction ID is synthetic and cannot be joined with downstream BroadcastReceived/BroadcastApplied events. This is critical so telemetry dashboard developers don't waste time debugging why joins fail.

  2. Clarify broadcasted_to semantics. Either:

    • Rename/document that it means "enqueued" not "sent" in the production path, OR
    • Consider not emitting the event in the production path at all until BroadcastQueue can report actual delivery results

Should Fix (Important)

  1. Add a basic test that verifies BroadcastEmitted events are actually produced in the simulation path. The existing testing_impl.rs ignores this variant. Without a test, a future refactor can silently break emission.

  2. Avoid cloning target_result.targets in the production path — record only the count (the individual peer list can be reconstructed from BroadcastReceived events on the receiving end).

Consider (Suggestions)

  1. Move new_state instead of cloning in the production path (it's the last use in non-simulation builds). Minor since it's Arc-based.

  2. Long-term: Thread the original update operation's transaction through NodeEvent::BroadcastStateChange so the production path can emit a correlatable transaction ID.


Verdict

Not ready to merge — needs changes. The core implementation is sound and follows codebase patterns, but the production path emits telemetry that is actively misleading (fake TX ID + inaccurate delivery count). At minimum, these limitations must be clearly documented so downstream consumers know what they're getting. Ideally, the production path should either emit accurate data or defer emission until it can.

[AI-assisted - Claude]

- Document production path telemetry limitations: synthetic TX ID
  (uncorrelatable with downstream events), broadcasted_to = enqueued
  count (not delivery confirmation)
- Stop cloning target_result.targets in production path, pass empty
  vec since peer list is reconstructable from BroadcastReceived events
- Move new_state instead of cloning in production path (last use)
- Track UpdateEvent::BroadcastEmitted in testing_impl.rs summary
  stats (was previously a no-op match arm)

Signed-off-by: majiayu000 <1835304752@qq.com>
Add assertion in test_subscription_broadcast_propagation that verifies
UpdateEvent::BroadcastEmitted telemetry events are actually produced
during simulation. Without this, a refactor could silently break emission
with no test catching it.

Add EventKind::is_update_broadcast_emitted() helper since UpdateEvent
is pub(crate) and not accessible from integration tests.

Signed-off-by: majiayu000 <1835304752@qq.com>
@majiayu000
Copy link
Copy Markdown
Contributor Author

Addressed in the last two commits:

Must-fix items (both done in 0b5a55c):

  1. Production path TX ID documented as synthetic/uncorrelatable — see the block comment above the emission call.
  2. broadcasted_to documented as enqueued count, not delivery confirmation.

Should-fix items:
3. Added assertion in test_subscription_broadcast_propagation that checks broadcasts_emitted > 0 so a future refactor can't silently break emission. Had to add EventKind::is_update_broadcast_emitted() since UpdateEvent is pub(crate). (1f23dba)
4. Already done in 0b5a55c — production path passes Vec::new() instead of cloning targets; count is captured separately.

Consider items:
5. new_state is moved instead of cloned in the production path (already in 0b5a55c).
6. Threading the real TX through NodeEvent::BroadcastStateChange — agreed this is the right long-term fix but it's a bigger change. Happy to do it as a follow-up.

Copy link
Copy Markdown
Collaborator

@sanity sanity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous review feedback has been addressed:

  1. Synthetic TX ID — now clearly documented in block comment
  2. broadcasted_to semantics — documented as enqueued count, not delivery confirmation
  3. Avoid cloning targets — production path passes empty vec, simulation path clones (acceptable for telemetry)
  4. Move instead of clonenew_state moved in production path
  5. Test addedtest_subscription_broadcast_propagation asserts BroadcastEmitted > 0
  6. Summary statsbroadcasts_emitted counter tracked in testing_impl.rs

4-perspective review (code-first, testing, skeptical, big-picture) found no blocking issues. Purely additive telemetry wiring (+102/-3) that closes the observability gap identified in #3622.

[AI-assisted - Claude]

@sanity sanity enabled auto-merge April 5, 2026 01:52
@sanity sanity added the test-exempt fix: PR exempt from regression test requirement (justified in PR comments) label Apr 12, 2026
@sanity
Copy link
Copy Markdown
Collaborator

sanity commented Apr 12, 2026

Adding test-exempt label: this PR wires emission of a pre-existing telemetry event (UpdateEvent::BroadcastEmitted) that was fully defined but never fired. It is purely additive observability, not a behavioral fix — there is no user-visible bug to regress against. Downstream StateVerifier tests already validate processing of this event variant. Approved by @sanity; unblocking Rule Lint so auto-merge can proceed.

[AI-assisted - Claude]

@sanity sanity added the review-override Override rule-review/warnings status check label Apr 12, 2026
@sanity
Copy link
Copy Markdown
Collaborator

sanity commented Apr 12, 2026

Adding review-override label to clear the rule-review/warnings status check.

The Claude Rule Review warning asks for a dedicated new #[test] function, but the author already added a broadcast_emitted_count > 0 assertion to test_subscription_broadcast_propagation (commit 1f23dba) which exercises the exact code path being wired up. Given:

  1. This is telemetry emission wiring for a pre-existing event type (not a user-visible behavioral bug)
  2. An assertion guarding the emission is in place in an integration test that runs in CI
  3. Already approved by @sanity with auto-merge enabled

A standalone test function would be strictly additive at this point. Overriding to unblock merge.

[AI-assisted - Claude]

@sanity sanity added this pull request to the merge queue Apr 12, 2026
Merged via the queue into freenet:main with commit 7a93b91 Apr 12, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-review review-override Override rule-review/warnings status check test-exempt fix: PR exempt from regression test requirement (justified in PR comments)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: emit UpdateEvent::BroadcastEmitted telemetry events

4 participants