Skip to content

feat(frost/signing): expose ROAST-retry evidence counters via clientinfo#3990

Merged
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-roast-evidence-metrics-2026-05-23
May 23, 2026
Merged

feat(frost/signing): expose ROAST-retry evidence counters via clientinfo#3990
mswilkison merged 1 commit into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-roast-evidence-metrics-2026-05-23

Conversation

@mswilkison
Copy link
Copy Markdown
Contributor

Summary

Process-wide cumulative counters for the three evidence categories
(overflow / reject / conflict), exposed through keep-core's
`clientinfo` registry so operators can observe per-category event
rates via the standard Prometheus scrape.

In default builds and unregistered-coordinator states, the
metrics-emitting recorder is bypassed entirely (the receive loops
use `attempt.NoOpRecorder`), so the counters stay at zero. Once
the ROAST-retry registry is populated and live signing flows
record evidence, the counters increment -- providing the
"do I have ROAST retry running?" smoke test from operator
dashboards.

Stacked on #3989 (AttemptContextHash enforcement).

What lands

File Role
`roast_retry_metrics.go` (new, untagged) Cumulative atomic counters; `RegisterRoastRetryMetrics(clientinfo.Registry)` registers Source functions under the `frost_roast_retry` application prefix; `metricsEmittingRecorder` wraps the bounded recorder and bumps the counter on each Record call.
`roast_retry_recorder.go` (modified) `roastRetryRecorderForCollect` now wraps the bounded recorder with `newMetricsEmittingRecorder` when the registry is populated.

Metrics exposed

Via `clientinfo.Registry.ObserveApplicationSource`:

Metric name Description
`frost_roast_retry_overflow_events_total` Cumulative count of receive-channel overflow events
`frost_roast_retry_reject_events_total` Cumulative count of validation-gate rejections (incl. `attempt_context_hash_mismatch` from #3989)
`frost_roast_retry_conflict_events_total` Cumulative count of first-write-wins equivocation events

Test coverage (6 cases)

  • Counters increment on Record* (different per-category counts)
  • Snapshot delegates to inner recorder
  • Nil inner falls back to NoOp without panicking
  • Unregistered coordinator → NoOp recorder → no counter bumps
  • Concurrent counter increments are race-safe (16 workers × 100 calls)
  • RegisterRoastRetryMetrics(nil) is a no-op (defensive guard)

Operator wiring

The keep-core node's startup sequence should call:

```go
signing.RegisterRoastRetryMetrics(clientinfoRegistry)
```

alongside the existing registry observation calls. A follow-up to
`docs/development/frost-roast-retry-rollout.adoc` will document
this step.

Verification

Command Result
`go build ./...` clean
`go test ./pkg/frost/...` pass (5 packages)
`go test -race ./pkg/frost/signing/...` pass
`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/...` pass
`staticcheck -checks '-SA1019' ./pkg/frost/...` silent
`go vet ./pkg/frost/...` clean
`gofmt -l ./pkg/frost/signing/` silent

Test plan

  • CI green.
  • Reviewer confirms the process-wide cumulative counter shape (alternative: per-session gauges, more granular but harder to query at a glance).
  • Reviewer confirms the `frost_roast_retry` application prefix is acceptable (alternative: more specific prefix like `frost_roast_retry_evidence`).

Adds process-wide cumulative counters for the three evidence
categories (overflow / reject / conflict) and exposes them through
keep-core's clientinfo registry so operators can observe per-
category event rates via the standard Prometheus scrape.

The counters increment whenever a metrics-emitting recorder
records an event. In default builds and in unregistered-coordinator
states the recorder is NoOp, so the counters stay at zero.
Operators only see non-zero values once the ROAST-retry registry
is populated and live signing flows record evidence -- the
"do I have ROAST retry running?" smoke test.

* pkg/frost/signing/roast_retry_metrics.go (new, untagged)
  - Cumulative atomic counters: roastRetryOverflowEvents,
    roastRetryRejectEvents, roastRetryConflictEvents.
  - RegisterRoastRetryMetrics(*clientinfo.Registry) registers
    Source functions under the "frost_roast_retry" application
    prefix, producing metrics named:
      - frost_roast_retry_overflow_events_total
      - frost_roast_retry_reject_events_total
      - frost_roast_retry_conflict_events_total
    via the existing ObserveApplicationSource mechanism.
  - metricsEmittingRecorder wraps an attempt.EvidenceRecorder
    and bumps the matching counter on each Record* call before
    delegating to the inner recorder.
  - Nil-safe: a nil inner recorder collapses to NoOp; a nil
    clientinfo.Registry is a no-op registration.

* pkg/frost/signing/roast_retry_recorder.go (modified)
  - roastRetryRecorderForCollect now wraps the bounded recorder
    with newMetricsEmittingRecorder when the registry is
    populated. NoOp path is unchanged (no metrics emission).

Tests (6 cases in roast_retry_metrics_test.go):

* Counters increment on Record* (with different per-category counts).
* Snapshot delegates to the inner recorder.
* Nil inner falls back to NoOp without panicking.
* Unregistered coordinator -> NoOp recorder -> no counter bumps.
* Concurrent counter increments are race-safe.
* RegisterRoastRetryMetrics(nil) is a no-op (defensive guard).

Operator wiring:

The keep-core node's startup sequence should call
RegisterRoastRetryMetrics(&clientinfo.Registry) alongside the
existing registry observation calls. Documentation will be added
in a follow-up to the rollout guide
(docs/development/frost-roast-retry-rollout.adoc).

Verification:

* go build ./... -- clean
* go test ./pkg/frost/... -- pass (5 packages)
* go test -race ./pkg/frost/signing/... -- pass
* go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
  ./pkg/frost/... -- pass (5 packages)
* staticcheck -checks '-SA1019' ./pkg/frost/... -- silent
* go vet ./pkg/frost/... -- clean
* gofmt -l ./pkg/frost/signing/ -- silent

Stacked on the AttemptContextHash enforcement PR.
Base automatically changed from feat/frost-roast-attempt-context-hash-required-2026-05-23 to feat/frost-schnorr-migration-scaffold May 23, 2026 04:10
@mswilkison mswilkison merged commit ed0a2e2 into feat/frost-schnorr-migration-scaffold May 23, 2026
14 checks passed
@mswilkison mswilkison deleted the feat/frost-roast-evidence-metrics-2026-05-23 branch May 23, 2026 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant