| title | Telemetry Processor |
|---|---|
| sidebar_order | 5 |
For feedback or suggestions, open an issue or PR and tag @philipphofmann, @isaacs, or @giortzisg for review. We welcome input as we validate this approach. This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels.
The telemetry processor ensures data is delivered efficiently to Sentry. It receives data from the client and forwards it to the transport. Its key responsibilities include buffering, rate limiting, client reports, priority-based sending, and, on some platforms, offline caching.
flowchart LR
Client -- send Data --> TelemetryProcessor
TelemetryProcessor -- sendEnvelope --> Transport
SDKs SHOULD only add the telemetry processor for high-volume data (spans, logs, metrics). SDKs without these features MAY omit it. Once added, SDK clients SHOULD forward all data to the processor, not the transport. During migration, SDKs MAY temporarily send only some telemetry data through the processor.
The telemetry processor consists of two major components:
- Telemetry Buffer: Focuses on buffering high-volume data, such as spans and logs, into batches to minimize HTTP requests.
- Telemetry Scheduler: Takes buffered data from the TelemetryBuffer and manages prioritized sending, including potential offline caching and sending of client reports.
Here's a more detailed overview of the most important components of the telemetry processor:
flowchart
Client -- add telemetry data --> TelemetryBuffer
subgraph TelemetryProcessorSubgraph["TelemetryProcessor"]
subgraph TelemetryBufferSubgraph["TelemetryBuffer"]
TelemetryBuffer["**TelemetryBuffer**<br/>Knows all buffers and forwards telemetry data"]
SpanBuffer["**SpanBuffer**</br>Buckets per trace"]
TelemetryBuffer -- add Span --> SpanBuffer
TelemetryBuffer -- add Log or Metric --> GenericBuffer
SpanBuffer --> SendData
GenericBuffer --> SendData
SendData@{ shape: fork, label: "Fork or Join" }
end
TelemetryBuffer -- "add low-volume data (Error, Check-In, Session, etc.)" --> TelemetryScheduler
SendData -- when full or time out, flush buffered data --> TelemetryScheduler
subgraph TelemetrySchedulerSubGraph["TelemetryScheduler"]
TelemetryQueue["**TelemetryQueue**<br/>LIFO or RoundRobin</br>InMemory or Disk"]
TelemetryQueue -- nextEnvelope --> TelemetryScheduler
TelemetryScheduler -- add or remove envelope --> TelemetryQueue
end
end
Transport
TelemetryScheduler -- sendEnvelope -->Transport
Here's a simplified example of a log traveling through the telemetry processor:
- The client adds the log to the telemetry processor.
- The telemetry processor adds the log to the telemetry buffer.
- When the telemetry buffer is full or its timeout fires, it forwards a batch of logs to the telemetry scheduler.
- The telemetry scheduler builds the envelope containing the batch of logs and adds it to the telemetry queue.
- Depending on the platform, the telemetry queue can use either in-memory or disk storage. Each platform can also choose a priority concept. It can be, for example, LIFO or RoundRobin.
- The telemetry scheduler asks the telemetry queue for the next envelope to send and passes it to the transport.
- The transport sends the envelope to Sentry.
This exact flow is not enforced for all SDKs. SDKs MAY adapt the architecture as needed, such as having the telemetry scheduler pull from the telemetry buffer instead of receiving pushed data.
We aim to standardize requirements so SDKs share consistent logic across platforms. This benefits multi-platform SDKs like Java. However, requirements MAY differ where necessary. This page outlines common requirements; platform-specific requirements are listed below.
For rules on which items can be combined in a single envelope, see Envelope Item Constraints. These constraints impact the design — for example, the span buffer organizes spans per traceID because each envelope can only contain spans from a single trace, while logs and metrics can be batched across traces.
The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms:
- Before adding an item to a specific buffer, the telemetry buffer SHOULD drop rate-limited items to avoid overhead. If doing so, it MUST record client reports.
- When the telemetry buffer overflows and it drops data, it MUST record client reports.
- The telemetry buffer MUST forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler.
- The telemetry buffer MUST start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer MUST forward all items to the telemetry scheduler.
- The telemetry buffer MUST define a size limit of
xitems. See Size Limit Recommendations below for more details. - When the size limit is reached, the telemetry buffer MUST forward all items to the telemetry scheduler. The buffer MAY forward items in batches.
SDKs SHOULD use the following size limits for the telemetry buffer. SDKs MAY use lower values, but they MUST NOT exceed the following size limits:
- 100 items for logs
- 100 items for metrics
- 1000 items for spans
While the envelope size limits would allow higher size limits for specific categories, these limits are optimized for Relay and exceeding them is absolutely discouraged.
The telemetry buffer MUST forward all data in memory to the telemetry scheduler to avoid data loss in the following scenarios:
- When the user calls
SentrySDK.flush(), the telemetry buffer MUST forward all data in memory to the telemetry scheduler, and only then SHOULD the telemetry scheduler flush the data. - When the user calls
SentrySDK.close(), the telemetry buffer MUST forward all data in memory to the telemetry scheduler. SDKs SHOULD keep their existing closing behavior. - When the application shuts down gracefully, the telemetry buffer MUST forward all data in memory to the telemetry scheduler.
Spans require a dedicated buffer because we can only send spans from a single trace per envelope. Still, the span buffer MUST respect the common telemetry buffer requirements.
For the telemetry span buffer, we recommend using a map of traceID to span buckets per trace. SDKs may also use alternatives, such as a fixed-ring buffer with tombstones. Regardless of the data structure, your SDK MUST meet these requirements:
- When forwarding spans, the span buffer MUST forward only spans from a single trace, since each envelope supports only one traceID. This means that the span buffer might forward spans multiple times.
- When the span buffer overflows and it drops data, it SHOULD drop all spans from a single traceID rather than individual spans. Dropping individual spans creates incomplete traces that are harder to debug, whereas dropping a complete trace preserves the integrity of the remaining traces.
The span buffer is a map where each key is a traceID and each value is a list of spans.
spanBuffer = {
"trace-a": [span1, span2, span3],
"trace-b": [span4],
"trace-c": [span5, span6]
}
Here are the requirements for the buckets per traceID:
- When the span buffer adds a span, it MUST add it to the bucket with the traceID of the span.
- When the span buffer doesn't have a bucket for the traceID, it MUST create a new bucket for this traceID.
- After forwarding the spans, the span buffer MUST remove all spans from the bucket and delete the bucket.
The telemetry scheduler receives buffered data from the telemetry buffer, prioritizes critical low-volume data (like errors and crashes) over high-volume data (like spans), builds envelopes, and forwards them to the transport. Before sending, it enforces rate limits and records client reports for dropped envelopes and envelope items. Finally, it also forwards client reports to the transport.
The now deprecated unified API suggested to handle rate limits and offline caching in the transport, forcing multi-platform SDKs (like Java) to maintain separate transports. To simplify, transports SHOULD now only send envelopes. The telemetry scheduler SHOULD handle rate limiting, offline caching, and prioritization, ensuring consistent logic across platforms. This also keeps custom user defined transports simple. SDKs need only adjust the scheduler for platform-specific needs.
We recommend LIFO or round-robin as priority strategies, but SDKs MAY use other methods. Envelopes can be stored in memory, on disk, or with other solutions as needed.
Here are the common requirements for all platforms:
- The telemetry scheduler MUST drop rate-limited envelope items before forwarding them to the transport. When doing so, it MUST record respective client reports.
- The telemetry scheduler SHOULD attach the client reports to the envelopes it forwards to the transport.
- When the telemetry queue overflows and drops envelopes, it MUST record the client report.
To be defined: How exactly does the telemetry scheduler interact with the transport to correctly remove items from the telemetry queue based on the received HTTP info and how to adhere to the dealing with network failure requirements.
The telemetry scheduler SHOULD add client reports to outgoing envelopes or periodically send standalone client report envelopes to the transport, following the SDK side recommendations.
Client report tracking SHOULD NOT be implemented in the telemetry scheduler. Instead, the SDK SHOULD have ClientReports accessible throughout the SDK, since data can be dropped in multiple places, as shown in the diagram below. The telemetry scheduler SHOULD request the current client reports from ClientReports.
Here is a diagram showing how ClientReports interacts with the telemetry scheduler.
flowchart LR
ClientReports["ClientReports"]
Client -- "records: sample_rate, before_send, ..." --> ClientReports
subgraph TelemetryProcessor
TelemetryBuffer
TelemetryScheduler
end
TelemetryBuffer -- "records: buffer_overflow, ..." --> ClientReports
TelemetryScheduler -- "records: queue_overflow, ..." --> ClientReports
Transport -- "records: send_error, network_error" --> ClientReports
ClientReports -- "aggregated reports" --> TelemetryScheduler
As mentioned above, the telemetry buffer SHOULD drop rate-limited data to reduce overhead, and the telemetry scheduler MUST drop rate-limited envelopes before forwarding them to the transport. To do this, the telemetry processor MUST have access to active rate limits. If your SDKs keep track of active rate limits in the transport, we recommend extracting them from the transport and making them available to the telemetry processor. Ideally, the transport is unaware of rate limits and only passes update information to the rate limits class, which manages updates.
It's worth noting that rate limits MUST still be tracked per each DSN. Whatever bookkeeping you do to track rate limits, it MUST be done per each DSN.
The following diagram shows how the telemetry processor interacts with the transport and rate limits. SDKs can choose to implement RateLimits inside or outside the telemetry processor, based on their needs.
flowchart LR
RateLimits["RateLimits"]
subgraph TelemetryProcessor
TelemetryBuffer
TelemetryScheduler
end
TelemetryBuffer -- "checks active rate limits" --> RateLimits
TelemetryScheduler -- "checks active rate limits" --> RateLimits
Transport -- "reports HTTP responses" --> RateLimits
For the platform specific requirements, please see:
- Backend Telemetry Processor: Detailed backend design
- Mobile Telemetry Processor: Spec under validation.
The batch processor is deprecated, so we moved it to the batch-processor page. The telemetry buffer will include parts of the batch processor functionality.