You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a relay operator, I want time-series counters for forwarded frames (by direction) and grace-window expiries, so I can observe relay throughput and tell "binary disconnected and came back inside grace" from "binary disconnected and the slot was reclaimed" at scale.
Context
Third slice of the metrics rollout (split from #37). The metrics registry scaffolding (#59) ships the registry; #57 adds upgrade/register counters; this slice wires the forward-loop counter and the grace-expiry counter.
The frame counter increments on the hot path inside StartPhoneForwarder / StartBinaryForwarder — one increment per forwarded frame. Prometheus counter increments are atomic and cheap, but the architect should confirm the increment lives after a successful Send (or the chosen sink-error decision per direction, matching the forwarder error-policy pattern in PROJECT-MEMORY — phone forwarder returns on sink error so the increment goes before-and-on-success; binary forwarder continues on per-sink errors so the increment goes per successful Send, never on the loop iteration).
The grace-expiry counter fires inside the time.AfterFunc callback in Registry.ScheduleReleaseServer — increment after the stale-fire pointer-identity guard returns true (i.e. only on a real eviction, never on stale fires).
pyrycode_relay_grace_expiries_total counter (no labels), registered against the same registry.
StartPhoneForwarder (internal/relay/forward.go) increments frames_forwarded_total{direction="phone_to_binary"} exactly once per successful binary.Send. No increment on BinaryFor miss, marshal error, or Send error.
StartBinaryForwarder (same file) increments frames_forwarded_total{direction="binary_to_phone"} exactly once per successful phone.Send. No increment on unknown conn_id, malformed envelope, or per-sink Send error (matches the N-sink continue-on-per-sink-error policy from PROJECT-MEMORY).
Registry.ScheduleReleaseServer increments grace_expiries_total exactly once per actual eviction, inside the pointer-identity guard's success branch — stale-fire no-ops do not increment.
Tests exercise each direction with a fake source and sink and assert the counter increment count matches the success count, not the read-loop iteration count.
A grace-expiry race test (matching the existing ScheduleReleaseServer race coverage) confirms the counter increments only on real evictions, never on stale fires.
Histogram for send-duration (pyrycode_relay_send_duration_seconds, listed as "optional, expensive" in the original relay: /metrics Prometheus endpoint — operational time-series counters #37) is explicitly out of scope here. If it lands, it's its own ticket — the cost/value tradeoff (per-frame time.Now() × 2 plus bucket bookkeeping in the hottest loop) warrants its own architect pass.
The architect should confirm that the increment site for the binary-side forwarder is per-phone.Send, not per-loop-iteration — the loop fans out to multiple phones per envelope, and we want the metric to reflect actual forwarded-to-phone frames, not envelopes consumed.
User Story
As a relay operator, I want time-series counters for forwarded frames (by direction) and grace-window expiries, so I can observe relay throughput and tell "binary disconnected and came back inside grace" from "binary disconnected and the slot was reclaimed" at scale.
Context
Third slice of the metrics rollout (split from #37). The metrics registry scaffolding (#59) ships the registry; #57 adds upgrade/register counters; this slice wires the forward-loop counter and the grace-expiry counter.
The frame counter increments on the hot path inside
StartPhoneForwarder/StartBinaryForwarder— one increment per forwarded frame. Prometheus counter increments are atomic and cheap, but the architect should confirm the increment lives after a successfulSend(or the chosen sink-error decision per direction, matching the forwarder error-policy pattern in PROJECT-MEMORY — phone forwarder returns on sink error so the increment goes before-and-on-success; binary forwarder continues on per-sink errors so the increment goes per successfulSend, never on the loop iteration).The grace-expiry counter fires inside the
time.AfterFunccallback inRegistry.ScheduleReleaseServer— increment after the stale-fire pointer-identity guard returns true (i.e. only on a real eviction, never on stale fires).Acceptance Criteria
pyrycode_relay_frames_forwarded_totalcounter vector defined with labeldirection(phone_to_binary|binary_to_phone), registered against the metrics registry from relay: adopt prometheus/client_golang and introduce metrics registry scaffolding #59.pyrycode_relay_grace_expiries_totalcounter (no labels), registered against the same registry.StartPhoneForwarder(internal/relay/forward.go) incrementsframes_forwarded_total{direction="phone_to_binary"}exactly once per successfulbinary.Send. No increment onBinaryFormiss, marshal error, orSenderror.StartBinaryForwarder(same file) incrementsframes_forwarded_total{direction="binary_to_phone"}exactly once per successfulphone.Send. No increment on unknownconn_id, malformed envelope, or per-sinkSenderror (matches the N-sink continue-on-per-sink-error policy from PROJECT-MEMORY).Registry.ScheduleReleaseServerincrementsgrace_expiries_totalexactly once per actual eviction, inside the pointer-identity guard's success branch — stale-fire no-ops do not increment.ScheduleReleaseServerrace coverage) confirms the counter increments only on real evictions, never on stale fires.make vet,make test -race, andmake buildclean.docs/knowledge/codebase/<n>.mdsummary entry created;docs/knowledge/INDEX.mdupdated.Technical Notes
pyrycode_relay_send_duration_seconds, listed as "optional, expensive" in the original relay: /metrics Prometheus endpoint — operational time-series counters #37) is explicitly out of scope here. If it lands, it's its own ticket — the cost/value tradeoff (per-frametime.Now()× 2 plus bucket bookkeeping in the hottest loop) warrants its own architect pass.phone.Send, not per-loop-iteration — the loop fans out to multiple phones per envelope, and we want the metric to reflect actual forwarded-to-phone frames, not envelopes consumed.Size Estimate
XS
Split from #37. Depends on #59.