From d24c4b9efa24594c25469b0f3cdd83164faaa96a Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Mon, 2 Mar 2026 11:22:47 +0100 Subject: [PATCH 01/16] ref(telemetry-processor): Dedupe arch overview First step of a multi-part refactoring to consolidate duplicate content between the backend and index telemetry processor pages. This PR moves the architecture overview and API definition to the index file and removes the duplicated parts from the backend file. Co-Authored-By: Claude --- .../backend-telemetry-processor.mdx | 52 +------------------ .../processing/telemetry-processor/index.mdx | 19 ++++--- 2 files changed, 14 insertions(+), 57 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 880a336a0f2e9..9a508fc23c8c2 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -12,57 +12,9 @@ For the common specification, refer to the [Telemetry Processor](/sdk/foundation ## Backend-Specific Design Decisions -- **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). +- **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. - -### Architecture Overview - -Introduce a `TelemetryProcessor` layer between the `Client` and the `Transport`. This `TelemetryProcessor` wraps prioritization and scheduling and exposes a minimal API to the SDK: - -- Add(item). -- Flush(timeout). -- Close(timeout). - -``` -┌────────────────────────────────────────────────────────────────────────────┐ -│ Client │ -│ captureEvent / captureTransaction / captureCheckIn / captureLog │ -└────────────────────────────────────────────────────────────────────────────┘ - - ▼ -┌────────────────────────────────────────────────────────────────────────────┐ -│ TelemetryProcessor │ -│ Add(item) · Flush(timeout) · Close(timeout) │ -│ │ -│ ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────┐ │ -│ │ Error Buffer │ │ Check-in Buffer │ │ Log Buffer │ │ -│ │ (CRITICAL) │ │ (HIGH) │ │ (LOW) │ │ -│ │ Timeout: N/A │ │ Timeout: N/A │ │ Timeout: 5s │ │ -│ │ BatchSize: 1 │ │ BatchSize: 1 │ │ BatchSize: 100 │ │ -│ └──────────────────────┘ └──────────────────────┘ └──────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ TelemetryScheduler (Weighted Round-Robin) │ │ -│ │ - Priority weights: CRITICAL=5, HIGH=4, MEDIUM=3, LOW=2, LOWEST=1 │ │ -│ │ - Processes a batch of items based on BatchSize and/or Timeout │ │ -│ │ - Builds envelopes from batch │ │ -│ │ - Submits envelopes to transport │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -└────────────────────────────────────────────────────────────────────────────┘ - - ▼ -┌────────────────────────────────────────────────────────────────────────────┐ -│ Transport │ -│ - Single worker, disk cache, offline retry, client reports │ -└────────────────────────────────────────────────────────────────────────────┘ -``` - -#### How the Processor works - -- **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking. -- **Weighted scheduling**: High-priority telemetry gets sent more frequently via weighted round-robin selection. -- **Transport compatibility**: Works with existing HTTP transport implementations without modification. +- **Transport compatibility**: The telemetry processor **MUST** work with existing transport implementations. ### Priorities - CRITICAL: Error, Feedback. diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 875063216d3c9..add0a603492da 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -20,6 +20,11 @@ flowchart LR TelemetryProcessor -- sendEnvelope --> Transport ``` +The telemetry processor **SHOULD** expose the following minimal API: + +- `Add(item)` — Adds a telemetry item to the processor. +- `Flush(timeout)` — Flushes all buffered data to the transport within the given timeout. + SDKs **SHOULD** only add the telemetry processor for high-volume data (spans, logs, metrics). SDKs without these features **MAY** omit it. Once added, SDK clients **SHOULD** forward all data to the processor, not the transport. During migration, SDKs **MAY** temporarily send only some telemetry data through the processor. The telemetry processor consists of two major components: @@ -88,13 +93,13 @@ We aim to standardize requirements so SDKs share consistent logic across platfor The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms: -1. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. -2. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. -3. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler. -4. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. -5. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. -6. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. - +1. The telemetry buffer **SHOULD** use separate buffers per telemetry category (e.g., one for spans, one for logs, one for metrics). +2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. +3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. +4. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler. +5. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. +6. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. +7. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. ## Size Limits From d8ad8fdf1da0f634f5cb0a184a77b5913744cc7e Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Mon, 2 Mar 2026 13:02:46 +0100 Subject: [PATCH 02/16] ref(telemetry-processor): Add size limits and open TODOs Co-Authored-By: Claude --- .../processing/telemetry-processor/index.mdx | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index add0a603492da..78527d59a8db2 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -91,19 +91,30 @@ We aim to standardize requirements so SDKs share consistent logic across platfor # Telemetry Buffer +{/* TODO: Open items from architecture consolidation (https://github.com/getsentry/sentry-docs/issues/16189): + - Requirement #4: Finish wording for low-volume data push vs pull approach. + - Errors/check-ins size limit of 1 was promoted from backend-specific to common — confirm this is correct for all platforms. + - Backend file line 36 says spans recommend 1000 "because span volume is higher", but the common spec already says 1000 — this is now redundant. +*/} + The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms: 1. The telemetry buffer **SHOULD** use separate buffers per telemetry category (e.g., one for spans, one for logs, one for metrics). 2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. 3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. -4. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler. -5. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. +4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer MUST either directly forward these items to the telemetry scheduler, or if the scheduler pulls items from the buffers the telemetry buffer TODO +5. For buffers with a size limit greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a size limit of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. 6. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. 7. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. ## Size Limits -SDKs **SHOULD** use the following size limits for the telemetry buffer. SDKs **MAY** use lower values, but they **MUST NOT** exceed the following size limits: +The following categories **SHOULD** use a size limit of 1, which means items are forwarded immediately: + +- Errors +- Check-ins + +For high-volume data, SDKs **SHOULD** use the following size limits. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: - 100 items for logs - 100 items for metrics From fb3572a8267eb0bba54bddb5672a98d4b44c3a00 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Mon, 2 Mar 2026 16:59:03 +0100 Subject: [PATCH 03/16] ref(telemetry-processor): WIP clarify batch size vs buffer capacity Co-Authored-By: Claude --- .../backend-telemetry-processor.mdx | 4 +--- .../processing/telemetry-processor/index.mdx | 23 +++++++++++-------- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 9a508fc23c8c2..c16cb7230fe3c 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -33,8 +33,6 @@ The telemetry buffer on the backend must follow the common [telemetry buffer req 1. The telemetry buffer **SHOULD** drop older items as the overflow policy. It **MAY** also drop newer items to preserve what's already buffered. -On the backend, use the same size limits as the [common requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer), except for spans, where we recommend **1000** because span volume is higher. - ##### Span Buffer The span buffer must follow the common [telemetry span buffer requirements](/sdk/foundations/processing/telemetry-processor/#span-buffer). Further requirements for the bucketed-by-trace buffer are: @@ -78,7 +76,7 @@ The only layer responsible for dropping events is the Buffer. In case that the t #### Telemetry Buffer Options - **Capacity**: 100 items for errors and check-ins, 10*BATCH_SIZE for logs, 1000 for transactions. - **Overflow policy**: `drop_oldest`. -- **Batch size**: 1 for errors and monitors (immediate send), 100 for logs. +- **Batch size**: 1 for errors and check-ins (immediate send), 100 for logs. - **Batch timeout**: 5 seconds for logs. #### Scheduler Options diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 78527d59a8db2..2356c78899c77 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -93,8 +93,7 @@ We aim to standardize requirements so SDKs share consistent logic across platfor {/* TODO: Open items from architecture consolidation (https://github.com/getsentry/sentry-docs/issues/16189): - Requirement #4: Finish wording for low-volume data push vs pull approach. - - Errors/check-ins size limit of 1 was promoted from backend-specific to common — confirm this is correct for all platforms. - - Backend file line 36 says spans recommend 1000 "because span volume is higher", but the common spec already says 1000 — this is now redundant. + - Errors/check-ins batch size of 1 was promoted from backend-specific to common — confirm this is correct for all platforms. */} The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms: @@ -103,24 +102,30 @@ The telemetry buffer batches high-volume data and forwards it to the telemetry s 2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. 3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. 4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer MUST either directly forward these items to the telemetry scheduler, or if the scheduler pulls items from the buffers the telemetry buffer TODO -5. For buffers with a size limit greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a size limit of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. -6. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. -7. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. +5. For buffers with a batch size greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a batch size of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. +6. The telemetry buffer **MUST** define a batch size per telemetry category. See [Batch Size Limits](#batch-size-limits) below for recommended values. +7. When the batch size is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. -## Size Limits +## Batch Size Limits -The following categories **SHOULD** use a size limit of 1, which means items are forwarded immediately: +The batch size controls how many items the telemetry buffer groups together before forwarding them to the telemetry scheduler. These limits exist because [envelope size limits](/sdk/foundations/transport/envelopes/#size-limits) constrain how many items a single envelope can carry. While the envelope limits would allow higher values for some categories, the batch sizes below are optimized for Relay and exceeding them is absolutely discouraged. + +The following categories **SHOULD** use a batch size of 1, which means items are forwarded immediately: - Errors - Check-ins -For high-volume data, SDKs **SHOULD** use the following size limits. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: +For high-volume data, SDKs **SHOULD** use the following batch sizes. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: - 100 items for logs - 100 items for metrics - 1000 items for spans -While the [envelope size limits](/sdk/foundations/transport/envelopes/#size-limits) would allow higher size limits for specific categories, these limits are optimized for Relay and exceeding them is absolutely discouraged. +## Buffer Capacity + +The buffer capacity defines the maximum number of items a buffer can hold in memory. This is separate from the [batch size](#batch-size-limits) — the buffer capacity **MUST** be greater than or equal to the batch size, but it **MAY** be larger. A larger capacity allows the buffer to absorb bursts of data without dropping items while the scheduler processes previous batches. + +When the buffer capacity is exceeded, the buffer **MUST** drop items according to its overflow policy and **MUST** record client reports for dropped items. Platform-specific pages define the recommended overflow policies and capacity values. For example, the [backend spec](/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor/#configuration) uses `drop_oldest` with capacities of 100 for errors and check-ins, and `10 * batch size` for logs. ## Data Forwarding Scenarios From a12a6dde73d5a3b9a584576c9985c1f3d04b2adb Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 10:48:10 +0100 Subject: [PATCH 04/16] polish --- .../backend-telemetry-processor.mdx | 10 ++++-- .../processing/telemetry-processor/index.mdx | 35 ++++++++----------- 2 files changed, 21 insertions(+), 24 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index c16cb7230fe3c..4a3fd217bc313 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -8,9 +8,11 @@ sidebar_order: 1 🚧 This document is work in progress. -For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes the backend-specific implementation. The key difference is that backend SDKs use **weighted round-robin scheduling** to ensure critical telemetry (like errors) gets priority over high-volume data (like logs) when the application is under heavy load. +For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes a backend-specific approach, which is optimized for high load. The key difference is that the telemetry scheduler pulls telemetry data from the telemetry buffers using **weighted round-robin scheduling**. -## Backend-Specific Design Decisions +It's worth noting that this approach may not be suitable for SDKs needing to support multiple platforms, such as Java, because this approach doesn't work well with offline caching. Offilne caches also need a priority based sending strategy and an priority based overflow strategy to avoid dropping critical data over high volume data. If the telemetry scheduler pulls data from the telemetry buffer and it supports an offline cache, it needs balance items in the offline cache with items from the telemetry buffer. Each SDK should evaluate its requirements and decide whether to adopt the backend-specific pull-based approach or continue using a push-based model, depending on its platform constraints and architectural needs. + +# Backend-Specific Design Decisions - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. @@ -29,7 +31,9 @@ Configurable via weights. #### TelemetryBuffer -The telemetry buffer on the backend must follow the common [telemetry buffer requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer). Here are the additional requirements for the backend-specific implementation: +The telemetry buffer on the backend must follow the common [telemetry buffer requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer). + + Here are the additional requirements for the backend-specific implementation: 1. The telemetry buffer **SHOULD** drop older items as the overflow policy. It **MAY** also drop newer items to preserve what's already buffered. diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 2356c78899c77..13c6b9c854e26 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -91,41 +91,34 @@ We aim to standardize requirements so SDKs share consistent logic across platfor # Telemetry Buffer -{/* TODO: Open items from architecture consolidation (https://github.com/getsentry/sentry-docs/issues/16189): - - Requirement #4: Finish wording for low-volume data push vs pull approach. - - Errors/check-ins batch size of 1 was promoted from backend-specific to common — confirm this is correct for all platforms. -*/} +The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. -The telemetry buffer batches high-volume data and forwards it to the telemetry scheduler. This section covers the common requirements for all platforms: -1. The telemetry buffer **SHOULD** use separate buffers per telemetry category (e.g., one for spans, one for logs, one for metrics). +## Common Requirements + +This section covers the common requirements for all platforms: + +1. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics). 2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. 3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. -4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer MUST either directly forward these items to the telemetry scheduler, or if the scheduler pulls items from the buffers the telemetry buffer TODO +4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer **SHOULD** directly forward these items to the telemetry scheduler. 5. For buffers with a batch size greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a batch size of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. -6. The telemetry buffer **MUST** define a batch size per telemetry category. See [Batch Size Limits](#batch-size-limits) below for recommended values. -7. When the batch size is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. - -## Batch Size Limits +6. When the batch size is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. -The batch size controls how many items the telemetry buffer groups together before forwarding them to the telemetry scheduler. These limits exist because [envelope size limits](/sdk/foundations/transport/envelopes/#size-limits) constrain how many items a single envelope can carry. While the envelope limits would allow higher values for some categories, the batch sizes below are optimized for Relay and exceeding them is absolutely discouraged. +## Batch Size Limit -The following categories **SHOULD** use a batch size of 1, which means items are forwarded immediately: +As ingestion sets limits on the [number of items an envelope](/sdk/foundations/transport/envelopes/#size-limits) can carry, and Relay is optimized for below defined maximum batch sizes, SDKs must adhere to these limits when sending envelopes. Exceeding them is absolutely discouraged. Consequently, the telemetry buffer must batch telemetry items to comply with size restrictions before forwarding them to the telemetry scheduler. -- Errors -- Check-ins - -For high-volume data, SDKs **SHOULD** use the following batch sizes. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: +For high-volume telemetry item types, SDKs **SHOULD** use the following batch sizes. SDKs **MAY** use lower values, but they **MUST NOT** exceed these limits: - 100 items for logs - 100 items for metrics - 1000 items for spans -## Buffer Capacity - -The buffer capacity defines the maximum number of items a buffer can hold in memory. This is separate from the [batch size](#batch-size-limits) — the buffer capacity **MUST** be greater than or equal to the batch size, but it **MAY** be larger. A larger capacity allows the buffer to absorb bursts of data without dropping items while the scheduler processes previous batches. +The following telemetry item types **MUST** use a batch size of 1: -When the buffer capacity is exceeded, the buffer **MUST** drop items according to its overflow policy and **MUST** record client reports for dropped items. Platform-specific pages define the recommended overflow policies and capacity values. For example, the [backend spec](/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor/#configuration) uses `drop_oldest` with capacities of 100 for errors and check-ins, and `10 * batch size` for logs. +- Errors +- Check-ins ## Data Forwarding Scenarios From c89da5bb394b1f271e87dac37efb27ac6acda295 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 10:50:08 +0100 Subject: [PATCH 05/16] more polish --- .../foundations/processing/telemetry-processor/index.mdx | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 13c6b9c854e26..1b82637815143 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -91,12 +91,7 @@ We aim to standardize requirements so SDKs share consistent logic across platfor # Telemetry Buffer -The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. - - -## Common Requirements - -This section covers the common requirements for all platforms: +The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. This section covers the common requirements for all platforms: 1. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics). 2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. From 43eebb2e81a4f70e4c8dedfb797be50dde1c2ea9 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 10:51:36 +0100 Subject: [PATCH 06/16] polish more --- .../telemetry-processor/backend-telemetry-processor.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 4a3fd217bc313..0b80ceed24902 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -31,12 +31,12 @@ Configurable via weights. #### TelemetryBuffer -The telemetry buffer on the backend must follow the common [telemetry buffer requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer). - - Here are the additional requirements for the backend-specific implementation: +The telemetry buffer on the backend must follow the common [telemetry buffer requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer). Here are the additional requirements for the backend-specific implementation: 1. The telemetry buffer **SHOULD** drop older items as the overflow policy. It **MAY** also drop newer items to preserve what's already buffered. +On the backend, use the same size limits as the [common requirements](/sdk/foundations/processing/telemetry-processor/#telemetry-buffer), except for spans, where we recommend **1000** because span volume is higher. + ##### Span Buffer The span buffer must follow the common [telemetry span buffer requirements](/sdk/foundations/processing/telemetry-processor/#span-buffer). Further requirements for the bucketed-by-trace buffer are: From b5faba215301692d9d66a021f034b7212dc85e55 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 10:55:45 +0100 Subject: [PATCH 07/16] move work with transports --- .../telemetry-processor/backend-telemetry-processor.mdx | 1 - .../sdk/foundations/processing/telemetry-processor/index.mdx | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 0b80ceed24902..31066b7d61284 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -16,7 +16,6 @@ It's worth noting that this approach may not be suitable for SDKs needing to sup - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. -- **Transport compatibility**: The telemetry processor **MUST** work with existing transport implementations. ### Priorities - CRITICAL: Error, Feedback. diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 1b82637815143..779d31854d765 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -25,6 +25,8 @@ The telemetry processor **SHOULD** expose the following minimal API: - `Add(item)` — Adds a telemetry item to the processor. - `Flush(timeout)` — Flushes all buffered data to the transport within the given timeout. +The telemetry processor **MUST** work with existing transport implementations. + SDKs **SHOULD** only add the telemetry processor for high-volume data (spans, logs, metrics). SDKs without these features **MAY** omit it. Once added, SDK clients **SHOULD** forward all data to the processor, not the transport. During migration, SDKs **MAY** temporarily send only some telemetry data through the processor. The telemetry processor consists of two major components: From d2a87941820e97d9efa89d5a520fadc5a1ccfd4d Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 10:57:22 +0100 Subject: [PATCH 08/16] polish --- .../sdk/foundations/processing/telemetry-processor/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 779d31854d765..40758f9c0b8d7 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -100,7 +100,7 @@ The telemetry buffer batches high-volume telemetry items and forwards them to th 3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. 4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer **SHOULD** directly forward these items to the telemetry scheduler. 5. For buffers with a batch size greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a batch size of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. -6. When the batch size is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. +6. When the batch size limit is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. ## Batch Size Limit From c54db683758e6a0e4a626fd44f9dafb03de689b1 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:00:42 +0100 Subject: [PATCH 09/16] polish more --- .../processing/telemetry-processor/index.mdx | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 40758f9c0b8d7..24ff00a24afa4 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -95,12 +95,13 @@ We aim to standardize requirements so SDKs share consistent logic across platfor The telemetry buffer batches high-volume telemetry items and forwards them to the telemetry scheduler. This section covers the common requirements for all platforms: -1. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics). -2. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. -3. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. -4. For low-volume data, such as events, session replays or user feedback, the telemetry buffer **SHOULD** directly forward these items to the telemetry scheduler. -5. For buffers with a batch size greater than 1, the telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. Buffers with a batch size of 1 **MUST** forward items immediately and **SHOULD NOT** use a timeout. -6. When the batch size limit is reached, the telemetry buffer **MUST** forward the batch to the telemetry scheduler. +1. Before adding an item to a specific buffer, the telemetry buffer **SHOULD** drop rate-limited items to avoid overhead. If doing so, it **MUST** record client reports. +2. When the telemetry buffer overflows and it drops data, it **MUST** record client reports. +3. The telemetry buffer **MUST** forward low-volume data, such as normal events, session replays, or user feedback, directly to the telemetry scheduler. +4. The telemetry buffer **MUST** start a timeout of 5 seconds when the first item is added. When the timeout expires, the telemetry buffer **MUST** forward all items to the telemetry scheduler. +5. The telemetry buffer **MUST** define a size limit of `x` items. See [Size Limit Recommendations](#size-limit-recommendations) below for more details. +6. When the size limit is reached, the telemetry buffer **MUST** forward all items to the telemetry scheduler. The buffer **MAY** forward items in batches. +7. The telemetry buffer **SHOULD** use separate buffers per telemetry item type (e.g., one for spans, one for logs, one for metrics). ## Batch Size Limit From be176136aeb25b86a211b0d8baf1629f9aa278f3 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:02:32 +0100 Subject: [PATCH 10/16] add link --- .../telemetry-processor/backend-telemetry-processor.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 31066b7d61284..711943c5dee65 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -10,7 +10,7 @@ sidebar_order: 1 For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes a backend-specific approach, which is optimized for high load. The key difference is that the telemetry scheduler pulls telemetry data from the telemetry buffers using **weighted round-robin scheduling**. -It's worth noting that this approach may not be suitable for SDKs needing to support multiple platforms, such as Java, because this approach doesn't work well with offline caching. Offilne caches also need a priority based sending strategy and an priority based overflow strategy to avoid dropping critical data over high volume data. If the telemetry scheduler pulls data from the telemetry buffer and it supports an offline cache, it needs balance items in the offline cache with items from the telemetry buffer. Each SDK should evaluate its requirements and decide whether to adopt the backend-specific pull-based approach or continue using a push-based model, depending on its platform constraints and architectural needs. +It's worth noting that this approach may not be suitable for SDKs needing to support multiple platforms, such as Java, because this approach doesn't work well with offline caching. Offilne caches also need a priority based sending strategy and an priority based overflow strategy to avoid dropping critical data over high volume data. If the telemetry scheduler pulls data from the telemetry buffer and it supports an offline cache, it needs balance items in the offline cache with items from the telemetry buffer. Each SDK should evaluate its requirements and decide whether to adopt the backend-specific pull-based approach or continue using a push-based model as defined in the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page, depending on its platform constraints and architectural needs. # Backend-Specific Design Decisions From b859ebc294cfd7489ace62283d8696967cc2f27d Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:03:29 +0100 Subject: [PATCH 11/16] remove note on not ideal --- .../telemetry-processor/backend-telemetry-processor.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 711943c5dee65..0dcd2cb6a5bfb 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -10,8 +10,6 @@ sidebar_order: 1 For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes a backend-specific approach, which is optimized for high load. The key difference is that the telemetry scheduler pulls telemetry data from the telemetry buffers using **weighted round-robin scheduling**. -It's worth noting that this approach may not be suitable for SDKs needing to support multiple platforms, such as Java, because this approach doesn't work well with offline caching. Offilne caches also need a priority based sending strategy and an priority based overflow strategy to avoid dropping critical data over high volume data. If the telemetry scheduler pulls data from the telemetry buffer and it supports an offline cache, it needs balance items in the offline cache with items from the telemetry buffer. Each SDK should evaluate its requirements and decide whether to adopt the backend-specific pull-based approach or continue using a push-based model as defined in the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page, depending on its platform constraints and architectural needs. - # Backend-Specific Design Decisions - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. From d6721c626d8dfd8701de429965dc503093a10780 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:04:56 +0100 Subject: [PATCH 12/16] more --- .../telemetry-processor/backend-telemetry-processor.mdx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 0dcd2cb6a5bfb..88751102d71a6 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -10,11 +10,16 @@ sidebar_order: 1 For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes a backend-specific approach, which is optimized for high load. The key difference is that the telemetry scheduler pulls telemetry data from the telemetry buffers using **weighted round-robin scheduling**. -# Backend-Specific Design Decisions +## Backend-Specific Design Decisions - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. +#### How the Processor works + +- **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking. +- **Weighted scheduling**: High-priority telemetry gets sent more frequently via weighted round-robin selection. + ### Priorities - CRITICAL: Error, Feedback. - HIGH: Session, CheckIn. From a46aed38b542a68d44cb1f3c8306b5edfbf17cb4 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:06:04 +0100 Subject: [PATCH 13/16] polish more --- .../telemetry-processor/backend-telemetry-processor.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index 88751102d71a6..aad6f285369cb 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -12,7 +12,7 @@ For the common specification, refer to the [Telemetry Processor](/sdk/foundation ## Backend-Specific Design Decisions -- **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). High-priority telemetry **MUST** be sent more frequently via weighted round-robin selection. +- **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. #### How the Processor works From 3c47cbe9cd37035ae9651aa61b020f45ab0b2d79 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:06:58 +0100 Subject: [PATCH 14/16] undo --- .../telemetry-processor/backend-telemetry-processor.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index aad6f285369cb..e5bfdf2451317 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -8,7 +8,7 @@ sidebar_order: 1 🚧 This document is work in progress. -For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes a backend-specific approach, which is optimized for high load. The key difference is that the telemetry scheduler pulls telemetry data from the telemetry buffers using **weighted round-robin scheduling**. +For the common specification, refer to the [Telemetry Processor](/sdk/foundations/processing/telemetry-processor/) page. This page describes the backend-specific implementation. The key difference is that backend SDKs use **weighted round-robin scheduling** to ensure critical telemetry (like errors) gets priority over high-volume data (like logs) when the application is under heavy load. ## Backend-Specific Design Decisions From 0bcac01b11bbae4155a34afe27a44b476c7de5a1 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:07:46 +0100 Subject: [PATCH 15/16] remove more --- .../telemetry-processor/backend-telemetry-processor.mdx | 5 ----- 1 file changed, 5 deletions(-) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx index e5bfdf2451317..83e681b23a4a2 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/backend-telemetry-processor.mdx @@ -15,11 +15,6 @@ For the common specification, refer to the [Telemetry Processor](/sdk/foundation - **Weighted round-robin scheduling**: Backend applications often run under sustained high load. Weighted scheduling ensures critical telemetry (errors) gets sent even when flooded with high-volume data (logs, spans). - **Signal-based scheduling**: The scheduler wakes when new data arrives rather than polling, reducing CPU overhead in idle periods. -#### How the Processor works - -- **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking. -- **Weighted scheduling**: High-priority telemetry gets sent more frequently via weighted round-robin selection. - ### Priorities - CRITICAL: Error, Feedback. - HIGH: Session, CheckIn. From 3424b925abf751b30bfe3ed2d039d9e28a7bbf35 Mon Sep 17 00:00:00 2001 From: Philipp Hofmann Date: Tue, 3 Mar 2026 11:21:01 +0100 Subject: [PATCH 16/16] docs(develop-docs): Add Close(timeout) to telemetry processor minimal API Co-Authored-By: Claude --- .../sdk/foundations/processing/telemetry-processor/index.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx index 24ff00a24afa4..08f9051126788 100644 --- a/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx +++ b/develop-docs/sdk/foundations/processing/telemetry-processor/index.mdx @@ -24,6 +24,7 @@ The telemetry processor **SHOULD** expose the following minimal API: - `Add(item)` — Adds a telemetry item to the processor. - `Flush(timeout)` — Flushes all buffered data to the transport within the given timeout. +- `Close(timeout)` — Flushes all buffered data and closes the processor within the given timeout. The telemetry processor **MUST** work with existing transport implementations.