Skip to content

telemetry: add timestamp index for latency sample gap correction#3237

Open
snormore wants to merge 12 commits intomainfrom
snor/telemetry-timestamp-index
Open

telemetry: add timestamp index for latency sample gap correction#3237
snormore wants to merge 12 commits intomainfrom
snor/telemetry-timestamp-index

Conversation

@snormore
Copy link
Contributor

@snormore snormore commented Mar 11, 2026

Summary of Changes

  • Add a companion TimestampIndex PDA account that records per-write-batch timestamps, enabling accurate wall-clock reconstruction even when agents experience downtime gaps
  • Implement onchain program (new instruction + write-time updates), Go SDK instruction builders, device telemetry and internet latency collector submitters, and read-only deserialization in Go/Python/TypeScript SDKs
  • Include timestamp reconstruction helpers using binary search (single sample, O(log m)) and single-pass cursor (batch, O(n+m)) in all three SDK languages

RFC: rfcs/rfc18-telemetry-write-timestamp-index.md

Closes #877

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 28 +1378 / -7 +1371
Tests 10 +337 / -5 +332
Docs/RFC 2 +232 / -0 +232
Other 1 +5 / -0 +5

~70% core logic, ~17% tests, ~12% RFC/docs. Note: this PR exceeds the ~500-line guideline due to cross-language SDK parity (Go + Python + TypeScript) and the onchain program changes — splitting further would leave incomplete functionality.

Key files (click to expand)
  • smartcontract/programs/doublezero-telemetry/src/state/timestamp_index.rs — New TimestampIndex account type: header struct, serialization, and 10K-entry ring buffer
  • smartcontract/programs/doublezero-telemetry/src/processors/telemetry/initialize_timestamp_index.rs — New instruction: creates the companion PDA derived from the samples account
  • smartcontract/programs/doublezero-telemetry/src/processors/telemetry/write_timestamp_index.rs — Appends a (sample_index, timestamp) entry on each write batch
  • smartcontract/programs/doublezero-telemetry/src/processors/telemetry/write_device_latency_samples.rs — Optionally accepts 4th account (timestamp index) for backward compat
  • sdk/telemetry/go/state.go — Go read-only SDK: DeserializeTimestampIndex + ReconstructTimestamp(s)
  • sdk/telemetry/python/telemetry/state.py — Python read-only SDK: TimestampIndex.from_bytes + reconstruct helpers
  • sdk/telemetry/typescript/telemetry/state.ts — TypeScript read-only SDK: deserializeTimestampIndex + reconstruct helpers
  • controlplane/telemetry/internal/telemetry/submitter.go — Device telemetry submitter: derives timestamp index PDA, passes on writes, initializes after new account creation

Testing Verification

  • Rust onchain program tests cover initialization, write-time appending, ring buffer wrap-around, and backward compatibility (writes without timestamp index account)
  • Cross-language fixture tests verify binary compatibility: Rust generates .bin/.json fixtures, Go/Python/TypeScript deserialize and assert field values match
  • Timestamp reconstruction unit tests in all three SDK languages validate binary search, single-pass cursor, entry boundary transitions, and empty-entries fallback

@snormore snormore force-pushed the snor/telemetry-timestamp-index branch 4 times, most recently from 0e16d96 to 52d1282 Compare March 21, 2026 13:37
snormore added 10 commits March 21, 2026 10:56
Add a per-write-batch timestamp index as a companion PDA account to both
device and internet latency samples accounts. This enables reliable
timestamp reconstruction even when agents experience downtime gaps within
an epoch.

The onchain program gains an InitializeTimestampIndex instruction and
both write processors optionally append timestamp entries when a 4th
account (the timestamp index PDA) is provided. The Go SDK, device
telemetry submitter, and internet latency collector are updated to
derive, initialize, and pass the timestamp index on every write.
… SDKs

Add read-only deserialization for the new TimestampIndex account type
across all three SDK languages, with fixture generation and cross-language
compatibility tests. Also adds a CHANGELOG entry for the feature.
Add reconstruct_timestamp/reconstruct_timestamps helpers to Go, Python,
and TypeScript SDKs for converting sample indices to wall-clock
timestamps using the timestamp index entries. Update RFC status to
Implemented and clarify the start_timestamp_microseconds behavior.
ReconstructTimestamp now uses binary search over entries — O(log m)
instead of O(m). ReconstructTimestamps uses a single-pass cursor
advancing through entries — O(n + m) instead of O(n * m).
When the samples account exists but the timestamp index doesn't (e.g.
mid-epoch rollout), the write instruction was returning the generic
AccountDoesNotExist error. The submitter interpreted this as the samples
account being missing and tried to re-initialize it, which failed with
AccountAlreadyExists.

Add TimestampIndexAccountDoesNotExist (1019) so submitters can
distinguish the two cases and initialize only the timestamp index.
When the timestamp index is created mid-epoch, the first entry may not
start at sample 0. The binary search would land on entries[0] for
samples before that index, then subtract a larger index from a smaller
one — causing uint32 underflow in Go and wrong timestamps in
Python/TypeScript.

Fall back to the implicit model (start_timestamp + index * interval)
for samples before the first timestamp index entry.
When the timestamp index hits MAX_TIMESTAMP_INDEX_ENTRIES, silently skip
the append instead of failing the transaction. The index is supplementary
data and should not block telemetry writes.

Also fix the RFC's incorrect claim that 10,000 entries covers per-second
writes for 48 hours — the cap is per write batch (up to 245 samples),
so it comfortably covers realistic workloads.
Unit test verifies that append_timestamp_index_entry returns Ok(())
without modifying account data when the index is at MAX_TIMESTAMP_INDEX_ENTRIES.
@snormore snormore force-pushed the snor/telemetry-timestamp-index branch from eea0f20 to 4b0384c Compare March 21, 2026 14:56
@martinsander00
Copy link
Contributor

In both submitters, when InitializeTimestampIndex fails, the code logs "writes will proceed without it" but retries the write with TimestampIndexPK` still set

When InitializeTimestampIndex failed, both submitters logged "writes
will proceed without it" but retried with TimestampIndexPK still set,
causing the write to fail again. Now nil out TimestampIndexPK on init
failure so the retry actually proceeds without the timestamp index.

Also remove the unused TimestampIndexFull error variant and renumber
TimestampIndexAccountDoesNotExist to 1018.
@martinsander00
Copy link
Contributor

martinsander00 commented Mar 23, 2026

will approve, not sure if nik or karl want to take a looks since they were involved in the initial convo in #877

Error on next_entry_index > MAX in Python/TS (Go already did this).
Error on truncated entry data in all three SDKs instead of silently
returning partial or zero-value entries.
@snormore snormore marked this pull request as ready for review March 23, 2026 13:20
@snormore snormore requested review from karl-dz and nikw9944 March 23, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

device/telemetry: consider storing timestamps with samples

2 participants