Skip to content

feat(relay): frame-forwarded and grace-expiry counters (#58)#88

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/58
May 13, 2026
Merged

feat(relay): frame-forwarded and grace-expiry counters (#58)#88
ilmoniemi merged 3 commits into
mainfrom
feature/58

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Third slice of the metrics rollout (split from #37). Adds two Prometheus counters routed through nil-safe func() hooks on *Registry, keeping the prometheus dep out of registry.go and forward.go:

  • pyrycode_relay_frames_forwarded_total{direction} — labelled by direction (phone_to_binary | binary_to_phone), incremented exactly once per successful sink Send. No increment on BinaryFor miss, marshal error, unknown conn_id, malformed envelope, or per-sink Send error.
  • pyrycode_relay_grace_expiries_total — scalar, incremented from Registry.handleGraceExpiry's success branch, after the pointer-identity guard so stale fires do not increment.

Per-direction CounterVec children are pre-bound from WithLabelValues so the hot-path hook is a single atomic Inc.

Issue

Closes #58. Depends on #59 (registry scaffolding) — already merged.

Architecture compliance

Follows docs/specs/architecture/58-frame-and-grace-counters.md:

Testing

New internal/relay/metrics_counters_test.go (four tests, all in package relay so they reuse fakeConn / fakePhone / fakeBinary / fakeBinarySource without duplication):

  • TestForwardMetrics_PhoneToBinary_OnlyOnSuccess — asserts +3 on three successful Sends, then verifies the counter does NOT move on Send error and on the no-binary path.
  • TestForwardMetrics_BinaryToPhone_OnlyOnSuccess — interleaves 3 successful frames with malformed-envelope, unknown-conn_id, and phone-Send-error paths; counter ends at exactly 3.
  • TestGraceMetrics_OnlyOnRealEviction — +1 on a real eviction, then a cancel-and-replace scenario verifies the stale fire does NOT move the counter past 1.
  • TestGraceMetrics_RaceFreedom — 16 goroutines × 200 ops hammering ClaimServer / ScheduleReleaseServer / cancel-replace cycles; race detector verdict is the assertion. Counter scrape sanity-checked at the end.

A shared assertCounter helper mirrors assertGauge from metrics_connections_test.go — substring-match on the literal text-format line, same posture (the response body also carries promhttp_metric_handler_* self-instrumentation lines).

  • make vet clean
  • make test (race) clean
  • make build clean

Test plan

  • CI: make vet, make test, make build all green
  • Scrape /metrics from a running relay; confirm three new series appear with the expected names and label cardinality

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 13, 2026 10:27
Adds two Prometheus counters via nil-safe hooks on *Registry, keeping
the prometheus dep out of registry.go and forward.go:

- pyrycode_relay_frames_forwarded_total{direction} — incremented after
  a successful sink Send in StartPhoneForwarder / StartBinaryForwarder.
  direction is a hard-coded constant set {phone_to_binary,
  binary_to_phone}, cardinality 2.
- pyrycode_relay_grace_expiries_total — incremented from
  handleGraceExpiry's success branch, after the pointer-identity guard
  so stale fires do not increment.

Per-direction CounterVec children are pre-bound from WithLabelValues so
the hot-path hook is a single atomic Inc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #58

Decision: PASS

Findings

  • [NIT] internal/relay/metrics_counters_test.go:14-30assertCounter substring-matches metric value, so pyrycode_relay_grace_expiries_total 1 would also match a body containing pyrycode_relay_grace_expiries_total 10. Practically unreachable here (the asserted counts are bounded by the test ops) and the existing assertGauge has the same posture, so no rework needed. Mentioned only for future tightening (e.g. require a trailing \n or pin EOL).
  • [NIT] internal/relay/metrics_forward.go:36-41 — pre-binding to Counter instead of resolving via WithLabelValues on every frame is the right call. The spec explicitly admits either; flagging here so it's noted as the chosen branch of open question feat(relay): routing-envelope wrapper type (#1) #2 — no action needed.

Summary

Clean implementation of the hooks-on-*Registry pattern. The hook-call placement in both forwarders is correct: the phone→binary hook fires only after a successful binary.Send (returns on Send error per the existing forwarder error policy), and the binary→phone hook fires only after a successful phone.Send (the three continue paths — unmarshal err, unknown conn_id, per-sink Send err — short-circuit before reaching the hook line). The grace hook is positioned at the end of handleGraceExpiry's success branch, after r.mu.Unlock() and the phone-close fan-out, so stale fires (pointer-identity guard fails) cannot reach it.

The hook-field doc comments on *Registry cover the load-bearing constraints (set once at boot, MUST NOT acquire r.mu, called outside any lock). The metric-collector doc comments restate the hard-coded direction constant set and the absence of a server label, so the security review's tokens-and-cardinality argument is visible at the call site, not just in the spec.

Tests cover all four AC paths (phone→binary success + Send error + no-binary; binary→phone success + malformed + unknown conn_id + sink Send err; grace real-eviction + stale-fire; race freedom under cancel-replace cycles). The race test is appropriately structured to assert only the race-detector's verdict and a presence-of-metric check, not an exact value (the cancel-replace race window makes that schedule-dependent).

CI green (test, security, image-scan).

Adds the per-ticket codebase note and an evergreen feature doc for the
two new Prometheus counters (frames_forwarded_total{direction},
grace_expiries_total) wired via nil-safe func() hooks on *Registry to
keep prometheus out of registry.go and forward.go. INDEX.md gains a
single entry for the new feature doc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit c46e90d into main May 13, 2026
4 checks passed
@ilmoniemi ilmoniemi deleted the feature/58 branch May 13, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: /metrics — frame-forwarded and grace-expiry counters

1 participant