-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
ref(telemetry-processor): Migrate info from backend spec #16637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
d24c4b9
d8ad8fd
d3ec3f7
017a655
fb3572a
a12a6dd
c89da5b
43eebb2
b5faba2
d2a8794
c54db68
be17613
b859ebc
d6721c6
a46aed3
3c47cbe
0bcac01
7e83fa5
3424b92
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,55 +15,6 @@ For the common specification, refer to the [Telemetry Processor](/sdk/foundation | |
| - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). | ||
| - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. | ||
|
|
||
| ### Architecture Overview | ||
|
|
||
| Introduce a `TelemetryProcessor` layer between the `Client` and the `Transport`. This `TelemetryProcessor` wraps prioritization and scheduling and exposes a minimal API to the SDK: | ||
|
|
||
| - Add(item). | ||
| - Flush(timeout). | ||
| - Close(timeout). | ||
|
|
||
| ``` | ||
| ┌────────────────────────────────────────────────────────────────────────────┐ | ||
| │ Client │ | ||
| │ captureEvent / captureTransaction / captureCheckIn / captureLog │ | ||
| └────────────────────────────────────────────────────────────────────────────┘ | ||
|
|
||
| ▼ | ||
| ┌────────────────────────────────────────────────────────────────────────────┐ | ||
| │ TelemetryProcessor │ | ||
| │ Add(item) · Flush(timeout) · Close(timeout) │ | ||
| │ │ | ||
| │ ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────┐ │ | ||
| │ │ Error Buffer │ │ Check-in Buffer │ │ Log Buffer │ │ | ||
| │ │ (CRITICAL) │ │ (HIGH) │ │ (LOW) │ │ | ||
| │ │ Timeout: N/A │ │ Timeout: N/A │ │ Timeout: 5s │ │ | ||
| │ │ BatchSize: 1 │ │ BatchSize: 1 │ │ BatchSize: 100 │ │ | ||
| │ └──────────────────────┘ └──────────────────────┘ └──────────────────┘ │ | ||
| │ │ │ | ||
| │ ▼ │ | ||
| │ ┌─────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ TelemetryScheduler (Weighted Round-Robin) │ │ | ||
| │ │ - Priority weights: CRITICAL=5, HIGH=4, MEDIUM=3, LOW=2, LOWEST=1 │ │ | ||
| │ │ - Processes a batch of items based on BatchSize and/or Timeout │ │ | ||
| │ │ - Builds envelopes from batch │ │ | ||
| │ │ - Submits envelopes to transport │ │ | ||
| │ └─────────────────────────────────────────────────────────────────────┘ │ | ||
| └────────────────────────────────────────────────────────────────────────────┘ | ||
|
|
||
| ▼ | ||
| ┌────────────────────────────────────────────────────────────────────────────┐ | ||
| │ Transport │ | ||
| │ - Single worker, disk cache, offline retry, client reports │ | ||
| └────────────────────────────────────────────────────────────────────────────┘ | ||
|
Comment on lines
-27
to
-58
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant with the diagrams in the index.mdx, so I removed it. |
||
| ``` | ||
|
|
||
| #### How the Processor works | ||
|
|
||
| - **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking. | ||
| - **Weighted scheduling**: High-priority telemetry gets sent more frequently via weighted round-robin selection. | ||
| - **Transport compatibility**: Works with existing HTTP transport implementations without modification. | ||
|
Comment on lines
-63
to
-65
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All covered now in the index.mdx |
||
|
|
||
| ### Priorities | ||
| - CRITICAL: Error, Feedback. | ||
| - HIGH: Session, CheckIn. | ||
|
|
@@ -126,7 +77,7 @@ The only layer responsible for dropping events is the Buffer. In case that the t | |
| #### Telemetry Buffer Options | ||
| - **Capacity**: 100 items for errors and check-ins, 10*BATCH_SIZE for logs, 1000 for transactions. | ||
| - **Overflow policy**: `drop_oldest`. | ||
| - **Batch size**: 1 for errors and monitors (immediate send), 100 for logs. | ||
| - **Batch size**: 1 for errors and check-ins (immediate send), 100 for logs. | ||
| - **Batch timeout**: 5 seconds for logs. | ||
|
|
||
| #### Scheduler Options | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,14 @@ flowchart LR | |
| TelemetryProcessor -- sendEnvelope --> Transport | ||
| ``` | ||
|
|
||
| The telemetry processor **SHOULD** expose the following minimal API: | ||
|
|
||
| - `Add(item)` — Adds a telemetry item to the processor. | ||
| - `Flush(timeout)` — Flushes all buffered data to the transport within the given timeout. | ||
| - `Close(timeout)` — Flushes all buffered data and closes the processor within the given timeout. | ||
|
|
||
philipphofmann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The telemetry processor **MUST** work with existing transport implementations. | ||
|
|
||
| SDKs **SHOULD** only add the telemetry processor for high-volume data (spans, logs, metrics). SDKs without these features **MAY** omit it. Once added, SDK clients **SHOULD** forward all data to the processor, not the transport. During migration, SDKs **MAY** temporarily send only some telemetry data through the processor. | ||
|
|
||
| The telemetry processor consists of two major components: | ||
|
|
@@ -86,25 +94,30 @@ We aim to standardize requirements so SDKs share consistent logic across platfor | |
|
|
||
| # Telemetry Buffer | ||
|
|
||
| The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms: | ||
| The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. This section covers the common requirements for all platforms: | ||
|
|
||
| 1. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. | ||
| 2. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. | ||
| 3. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler. | ||
| 4. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. | ||
| 5. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. | ||
| 6. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. | ||
| 7. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics). | ||
|
|
||
| ## Batch Size Limit | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added more detailed here and renamed this to BatchSizeLimit because I removed the batch size info with the backend diagram. |
||
|
|
||
| ## Size Limits | ||
| As ingestion sets limits on the [number of items an envelope](/sdk/foundations/transport/envelopes/#size-limits) can carry, and Relay is optimized for below defined maximum batch sizes, SDKs must adhere to these limits when sending envelopes. Exceeding them is absolutely discouraged. Consequently, the telemetry buffer must batch telemetry items to comply with size restrictions before forwarding them to the telemetry scheduler. | ||
|
|
||
| SDKs **SHOULD** use the following size limits for the telemetry buffer. SDKs **MAY** use lower values, but they **MUST NOT** exceed the following size limits: | ||
| For high-volume telemetry item types, SDKs **SHOULD** use the following batch sizes. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: | ||
|
|
||
| - 100 items for logs | ||
| - 100 items for metrics | ||
| - 1000 items for spans | ||
|
|
||
| While the [envelope size limits](/sdk/foundations/transport/envelopes/#size-limits) would allow higher size limits for specific categories, these limits are optimized for Relay and exceeding them is absolutely discouraged. | ||
| The following telemetry item types **MUST** use a batch size of 1: | ||
|
|
||
| - Errors | ||
| - Check-ins | ||
|
|
||
| ## Data Forwarding Scenarios | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to index.mdx