Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,55 +15,6 @@ For the common specification, refer to the [Telemetry Processor](/sdk/foundation
- **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans).
- **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods.

### Architecture Overview

Introduce a `TelemetryProcessor` layer between the `Client` and the `Transport`. This `TelemetryProcessor` wraps prioritization and scheduling and exposes a minimal API to the SDK:

- Add(item).
- Flush(timeout).
Comment on lines -22 to -23
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to index.mdx

- Close(timeout).

```
┌────────────────────────────────────────────────────────────────────────────┐
│ Client │
│ captureEvent / captureTransaction / captureCheckIn / captureLog │
└────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐
│ TelemetryProcessor │
│ Add(item) · Flush(timeout) · Close(timeout) │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────┐ │
│ │ Error Buffer │ │ Check-in Buffer │ │ Log Buffer │ │
│ │ (CRITICAL) │ │ (HIGH) │ │ (LOW) │ │
│ │ Timeout: N/A │ │ Timeout: N/A │ │ Timeout: 5s │ │
│ │ BatchSize: 1 │ │ BatchSize: 1 │ │ BatchSize: 100 │ │
│ └──────────────────────┘ └──────────────────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TelemetryScheduler (Weighted Round-Robin) │ │
│ │ - Priority weights: CRITICAL=5, HIGH=4, MEDIUM=3, LOW=2, LOWEST=1 │ │
│ │ - Processes a batch of items based on BatchSize and/or Timeout │ │
│ │ - Builds envelopes from batch │ │
│ │ - Submits envelopes to transport │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐
│ Transport │
│ - Single worker, disk cache, offline retry, client reports │
└────────────────────────────────────────────────────────────────────────────┘
Comment on lines -27 to -58
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant with the diagrams in the index.mdx, so I removed it.

```

#### How the Processor works

- **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking.
- **Weighted scheduling**: High-priority telemetry gets sent more frequently via weighted round-robin selection.
- **Transport compatibility**: Works with existing HTTP transport implementations without modification.
Comment on lines -63 to -65
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All covered now in the index.mdx


### Priorities
- CRITICAL: Error, Feedback.
- HIGH: Session, CheckIn.
Expand Down Expand Up @@ -126,7 +77,7 @@ The only layer responsible for dropping events is the Buffer. In case that the t
#### Telemetry Buffer Options
- **Capacity**: 100 items for errors and check-ins, 10*BATCH_SIZE for logs, 1000 for transactions.
- **Overflow policy**: `drop_oldest`.
- **Batch size**: 1 for errors and monitors (immediate send), 100 for logs.
- **Batch size**: 1 for errors and check-ins (immediate send), 100 for logs.
- **Batch timeout**: 5 seconds for logs.

#### Scheduler Options
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ flowchart LR
TelemetryProcessor -- sendEnvelope --> Transport
```

The telemetry processor **SHOULD** expose the following minimal API:

- `Add(item)` — Adds a telemetry item to the processor.
- `Flush(timeout)` — Flushes all buffered data to the transport within the given timeout.
- `Close(timeout)` — Flushes all buffered data and closes the processor within the given timeout.

The telemetry processor **MUST** work with existing transport implementations.

SDKs **SHOULD** only add the telemetry processor for high-volume data (spans, logs, metrics). SDKs without these features **MAY** omit it. Once added, SDK clients **SHOULD** forward all data to the processor, not the transport. During migration, SDKs **MAY** temporarily send only some telemetry data through the processor.

The telemetry processor consists of two major components:
Expand Down Expand Up @@ -86,25 +94,30 @@ We aim to standardize requirements so SDKs share consistent logic across platfor

# Telemetry Buffer

The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms:
The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. This section covers the common requirements for all platforms:

1. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports.
2. When the telemetry buffer overflows and it drops data, it **MUST** record client reports.
3. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler.
4. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler.
5. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details.
6. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches.
7. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics).

## Batch Size Limit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more detailed here and renamed this to BatchSizeLimit because I removed the batch size info with the backend diagram.


## Size Limits
As ingestion sets limits on the [number of items an envelope](/sdk/foundations/transport/envelopes/#size-limits) can carry, and Relay is optimized for below defined maximum batch sizes, SDKs must adhere to these limits when sending envelopes. Exceeding them is absolutely discouraged. Consequently, the telemetry buffer must batch telemetry items to comply with size restrictions before forwarding them to the telemetry scheduler.

SDKs **SHOULD** use the following size limits for the telemetry buffer. SDKs **MAY** use lower values, but they **MUST NOT** exceed the following size limits:
For high-volume telemetry item types, SDKs **SHOULD** use the following batch sizes. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits:

- 100 items for logs
- 100 items for metrics
- 1000 items for spans

While the [envelope size limits](/sdk/foundations/transport/envelopes/#size-limits) would allow higher size limits for specific categories, these limits are optimized for Relay and exceeding them is absolutely discouraged.
The following telemetry item types **MUST** use a batch size of 1:

- Errors
- Check-ins

## Data Forwarding Scenarios

Expand Down
Loading