diff --git a/develop-docs/application-architecture/dynamic-sampling/architecture.mdx b/develop-docs/application-architecture/dynamic-sampling/architecture.mdx
index 286c7120b2e56..409d331e3a62c 100644
--- a/develop-docs/application-architecture/dynamic-sampling/architecture.mdx
+++ b/develop-docs/application-architecture/dynamic-sampling/architecture.mdx
@@ -4,25 +4,17 @@ sidebar_order: 3
og_image: /og-images/application-architecture-dynamic-sampling-architecture.png
---
-The architecture that powers Dynamic Sampling is composed of several components that work together to get the organization's sample rate closer to the target fidelity.
-
-The two main components of the architecture are [Sentry](https://github.com/getsentry/sentry) and [Relay](https://github.com/getsentry/relay), but there are several other sub-components that are used to achieve the desired result, such as Redis, Celery, PostgreSQL, and Snuba.
+The architecture that powers Dynamic Sampling is composed of several components that work together to achieve the organization's target sample rate. The two main components of the architecture are [Sentry](https://github.com/getsentry/sentry) and [Relay](https://github.com/getsentry/relay).

## Sampling in Relay
-Relay is the first component involved in the Dynamic Sampling pipeline. It is responsible for receiving events from SDKs, sampling them, and forwarding them to the Sentry backend. _In reality, Relay does much more than that. If you want to learn more about it, you can look at Relay docs [here](https://docs.sentry.io/product/relay/)._
-
-In order for Relay to perform sampling, it needs to be able to **compute the sample rate** for each incoming event. The configuration of sampling can be done via a **rule-based system** that enables the definition of complex sampling behaviors by combining simple rules. These rules are embedded into the **project configuration**, which is computed and cached in Sentry. This configuration contains a series of fields that Relay uses to perform many of its tasks, including sampling.
-
-### Trace and Transaction Sampling
-
-Sentry supports **two fundamentally different types of sampling**, that are described in more detail [here](/dynamic-sampling/fidelity-and-biases/#trace-and-transaction-sampling).
+Relay is responsible for receiving events from SDKs, sampling them, and forwarding them to the Sentry ingestion pipeline. In order for Relay to perform sampling, it needs to be able to **compute the sample rate** for each incoming event. Sample rates are calulcated using a **rule-based system** that enables the definition of complex sampling behaviors by combining simple rules. These rules are embedded into the **project configuration**, which is computed and cached in Sentry, and fetched by Relay when needed.
### Sampling Configuration
-Inside the project configuration there is a field dedicated to sampling, named `dynamicSampling`. This field contains a list of **sampling rules** that are used to calculate the sample rate for each incoming event. The rules will be defined in the `rulesV2` field, inside the `dynamicSampling` object.
+The project configuration has a `dynamicSampling` field for sampling, which holds a list of **sampling rules** used to calculate the sample rate for each incoming event. These rules are defined in the `rulesV2` field within the `dynamicSampling` object.
#### The Rule Definition
@@ -67,27 +59,16 @@ Dynamic sampling rules must always include a `condition` field, otherwise the en
#### Fetching the Sampling Configuration
-The sampling configuration is fetched by Relay from Sentry in a pull fashion. This is done by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L34-L34)).
-
-On the Sentry side, the configuration will be computed in case of a cache miss and then cached in Redis. The cache is invalidated every time the configuration changes, but more details on that will be provided later.
+The sampling configuration is fetched by Relay from Sentry by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L32-L32)). When this endpoint is called, the Sentry backend will attempt to retrieve the configuration from the cache, and if the configuration is not found, it will be computed and then cached in Redis.
### Sampling Decision
-A sampling decision involves:
-
-1. Matching the incoming event and/or DSC against the configuration.
-2. Deriving a sample rate from the combination of `factor` and `sampleRate` rules.
-3. Making the sampling decision using a random number generator.
-
-In case no match is found, or if there are problems during matching, Relay will accept the event under the assumption that it's preferable to oversample rather than drop potentially important events.
+In order to arrive at a sampling decision, Relay matches the incoming event and/or DSC against the configuration, derives a sample rate from the combination of `factor` and `sampleRate` rules, and uses a random number generator to make the decision. In case there are problems during the matching process, Relay will accept the event under the assumption that it's preferable to oversample rather than drop potentially important events.
-Relay samples using two [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) instances: the **non-root sampling configuration** and the **root sampling configuration**. The non-root config belongs to the project of the incoming event, while the root config belongs to the project of the head transaction of the trace. If no root sampling configuration is available, only transaction sampling will be performed.
-
-With both configurations ready, Relay will attempt to match the transaction rules of the non-root config and then the trace rules of the root config. If both root and non-root configs are the same, Relay will perform the matching in the same manner.
-
-The payloads inspected for matching vary based on the type of rule being matched:
-- `transaction`: a transaction rule will match against the [Event](https://getsentry.github.io/relay/relay_event_schema/protocol/struct.Event.html) payload itself.
+In order to make the sampling decisions, Relay samples using a [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) that belongs to the project of the head transaction of the trace.
+The payloads inspected for matching vary based on the type of rule being matched
- `trace`: a trace rule will match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html), which remains consistent across all transactions of the trace.
+- `project`: a project rule will also match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html)
The matching that Relay performs is based on the `samplingValue` of the encountered rules. As specified earlier, depending on the type of `samplingValue`, Relay will either immediately return a result or continue matching other rules. More details about the matching algorithm can be found in the implementation [here](https://getsentry.github.io/relay/relay_sampling/evaluation/struct.SamplingEvaluator.html#method.match_rules).
@@ -108,14 +89,14 @@ Suppose Relay receives an incoming transaction with the following data:
}
```
-And suppose this is the merged configuration from the non-root and root project:
+And suppose this is the configuration:
```json
{
"rules": [
{
"id": 1,
- "type": "transaction",
+ "type": "trace",
"samplingValue": {
"type": "factor",
"value": 2.0
@@ -143,54 +124,24 @@ And suppose this is the merged configuration from the non-root and root project:
In this case, the matching will happen from **top to bottom** and the following will occur:
-1. Rule `1` is matched against the event payload, since it is of type `transaction`. The `samplingValue` is a `factor`, thus the accumulated factors will now be `2.0 * 1.0`, where `1.0` is the identity for the multiplication.
-2. Because rule `1` was a factor rule, the matching continues and rule `2` will be matched against the DSC, since it is of type `trace`. The `samplingValue` is a `sampleRate`, thus the matching will stop and the sample rate will be computed as `2.0 * 0.5 = 1.0`, where `2.0` is the factor accumulated from the previous rule and `0.5` is the sample rate of the current rule.
-
-
-
-It is important to note that a `sampleRate` rule must match in order for a sampling decision to be made; in case this condition is not met, the event will be kept. In practice, each project will have a uniform trace rule which will always match and contain the base sample rate of the organization.
-
-
+1. Rule `1` is matched against the DSC, since it is of type `trace`. The `samplingValue` is a `factor` with value `2.0`.
+2. Because rule `1` was a factor rule, the matching continues and rule `2` will again be matched against the DSC, since it is of type `trace`. The `samplingValue` is a `sampleRate`, thus the matching will stop and the sample rate will be computed as `2.0 * 0.5 = 1.0`, where `2.0` is the factor accumulated from the previous rule and `0.5` is the sample rate of the current rule.
## Rules Generation in Sentry
-Sentry is the second component involved in the Dynamic Sampling pipeline. It is responsible for generating the rules used by Relay to perform sampling.
-
-Generating rules is the most complicated step of the pipeline, since rules and their sampling values directly impact how far off the system is from the target fidelity rate.
+Sentry is responsible for generating the rules used by Relay to perform sampling.
### Generation of the Rules
The generation of rules is performed as part of the **project configuration recomputation**, which happens:
-1. Whenever Relay requests the configuration and it is not cached in Redis;
-2. Whenever the configuration is manually invalidated.
-
-The invalidation of the configuration can happen under **several circumstances**, by calling the [following function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). Some examples of cases when the recomputation happens are when a new release is detected, when some project settings change, or when the Celery tasks for computing the variable sample rates are finished executing.
-
-The generation of the rules that will be part of the project configuration recalculation is defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/base.py#L72-L72) and works by performing the following steps:
-
-1. Fetching the list of active biases, since some of them can be enabled or disabled by the user in the Sentry UI;
-2. Determining the base sample rate specific for each project.
-3. Computing the rules for each bias by using all the information available at the time (e.g., in memory, fetched from Redis);
-4. Packing the rules into the `dynamicSampling.rulesV2` field of the project configuration;
-5. Returning all the set of rules that will be stored in the project configuration.
-
-#### Redis for Shared State
-
-Certain biases require data that **must be computed from other parts of the system** (e.g., when a new release is created) or by background tasks that are run asynchronously.
-
-For such use cases, we decided to use a separate Redis instance, which is used to store many kinds of information, such as sample rates and boosted releases. During rule generation, we connect to this Redis instance and fetch the necessary data to compute the rules.
-
-An illustration on how Redis is used for tracking boosted releases (which are used by the [boost new releases bias](/dynamic-sampling/fidelity-and-biases/#boost-new-releases)):
-
-
-
-#### Celery Tasks for Asynchronous Processing
-
-Certain biases require data that **must be computed asynchronously by background tasks** due to the complexity of the computation (e.g., running cross-org queries on Snuba). This is the case for the [boost low-volume projects bias](/dynamic-sampling/fidelity-and-biases/#boost-low-volume-projects) and [boost low-volume transactions bias](/dynamic-sampling/fidelity-and-biases/#boost-low-volume-transactions), which need expensive queries to obtain all the necessary data to run the rebalancing algorithms. These tasks are handled by Celery workers, that are scheduled automatically by cron jobs configured in Sentry.
+1. When Relay requests the configuration and it is not cached in Redis.
+2. When the configuration is invalidated on demand by calling [this function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). This happens when a new release is detected, when certain project settings change, the dynamic sampling tasks for computing sample rates are finished executing, and more.
-_The data computed by these tasks could theoretically be computed sequentially during rules generation but for scalability reasons we opted to use Celery tasks instead. This way, the computation of the data can be parallelized and the rules generation can be performed faster._
+The rules are generated [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/base.py#L126-L143) by performing the following steps:
-An illustration on how multiple Celery tasks are scheduled for computing data required for rule generation:
+1. Fetch the list of active biases (since some of them can be enabled or disabled by the user in the Sentry UI)
+2. Determine the base sample rate for each project.
+3. Compute the rules for each bias.
-
+Data underlying the rules is computed asynchronously for scalability reasons. Multiple biases require data that must be computed from incoming volume data for the org in question. These biases are calculated asynchronously by background tasks that are executed by Celery and write results to Redis.
\ No newline at end of file
diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/biases.mdx
similarity index 78%
rename from develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx
rename to develop-docs/application-architecture/dynamic-sampling/biases.mdx
index 2a3b5f8e5e9a8..e9d29aaff8370 100644
--- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx
+++ b/develop-docs/application-architecture/dynamic-sampling/biases.mdx
@@ -1,20 +1,14 @@
---
-title: Fidelity and Biases
+title: Biases
sidebar_order: 2
-og_image: /og-images/application-architecture-dynamic-sampling-fidelity-and-biases.png
+og_image: /og-images/application-architecture-dynamic-sampling-biases.png
---
-Dynamic Sampling allows Sentry to automatically adjust the amount of data retained based on how valuable the data is to the user. This is technically achieved by applying a **sample rate** to every event, which is determined by a **set of rules** that are evaluated for each event.
+For a concrete walkthrough of how Dynamic Sampling rules combine to produce final sample rates, see [Dynamic Sampling by Example](/dynamic-sampling/by-example/).
-
-
-A sample rate is a number in the interval `[0.0, 1.0]` that will determine the likelihood of a transaction to be retained. For example, a sample rate of `0.5` means that the transaction will be retained on average 50% of the time.
-
-
-
-## The Concept of Fidelity
+### Target Sample Rate
+Dynamic Sampling defines a target sample rate for each project that specifies the amount of data to be retained. This target sample rate is a number between 0 and 1. This target sample rate is used to calculate the sample rate for each event, based on the project and transaction that the event belongs to. Dynamic Sampling does not use the project or transaction of the event, but rather the project and transaction that the trace was started from. The reason for this is that the system works on trace level, and not on an event level. So in order to retain an entire trace, decisions can only be made based on the information that is available at the start of the trace.
-At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all spans and transactions of an organization.
### Dynamic Sampling Modes
There are two available modes to govern the target sample rates for Dynamic Sampling. The definition of both the mode and the target sample rates are implemented using the organization options `sentry:sampling_mode` and `sentry:target_sample_rate` as well as the project option `sentry:target_sample_rate`.
@@ -28,29 +22,13 @@ When the user switches between modes, target sample rates are transferred unless
The [sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event.
-
-
For orgs under AM2, Dynamic sampling uses a [sliding window function](https://github.com/getsentry/sentry/blob/cc8cc38c8a108719d068e5622b24a8d0c744e84c/src/sentry/dynamic_sampling/tasks/sliding_window_org.py#L37-L61) over the incoming volume to calculate the target sample rate.
-### Approximate Fidelity
-
-It is important to note that fidelity only determines an **approximate target sample rate**, so there is flexibility in creating exact sample rates. The ingestion pipeline, composed of [Relay](https://docs.sentry.io/product/relay/) and other components, does not have the infrastructure to track volume, so it cannot create an actual weighted distribution within the target sample rate.
-
-Instead, the Sentry backend **computes a set of rules** whose goal is to cooperatively achieve the target sample rate. Determining when and how to set these rules is part of the Dynamic Sampling infrastructure.
-
-
-
-The effectively applied sample rate, in the end, depends on how much data matches each of the bias overrides.
-
-
-
-## Trace and Transaction Sampling
+The purpose of dynamic sampling is to achieve the target sample rate across all projects and transactions, while weighting the data based on the value of the data. These weights are applied on top of the base sample rate for each project.
-Sentry supports **two fundamentally different types of sampling**. While this is completely opaque to the user, these rule types provide the basic building blocks for every dynamic sampling functionality and bias.
-
-### Trace Sampling
+### How traces are sampled
A trace is a **collection of events that are related to each other**. For example, a trace could contain events started from your frontend that are then generating events in your backend.
@@ -66,10 +44,6 @@ In order to achieve full trace sampling, the random number generator used by Rel
-### Transaction Sampling
-
-Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces.
-
## Biases for Sampling
Dynamic Sampling uses biases to adjust how many events matching certain conditions are sampled. These biases are defined as a set of rules that Relay checks for each event. Learn more about these rules [on the architecture page](/dynamic-sampling/architecture/).
@@ -114,7 +88,7 @@ The list of development environments is available [here](https://github.com/gets
This bias is only active in Automatic Mode (and not in Manual Mode). It applies to any incoming trace and is defined on a per-project basis.
-This bias uses an algorithm to increase the sample rate for low-volume projects, which might otherwise be overshadowed by high-volume projects. It calculates a new sample rate for each project based on the organization’s overall sample rate and the number of transactions each project receives, aiming for a more balanced distribution. The algorithm dynamically adjusts these rates by measuring _the volume of incoming transactions over a sliding time window_ (also known as the target fidelity rate). At regular intervals, the system calls the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)) to determine the appropriate sample rate for each project.
+This bias uses an algorithm to increase the sample rate for low-volume projects, which might otherwise be overshadowed by high-volume projects. It calculates a new sample rate for each project based on the organization’s overall sample rate and the number of transactions each project receives, aiming for a more balanced distribution. The algorithm dynamically adjusts these rates by measuring _the volume of incoming transactions over a sliding time window_. At regular intervals, the system calls the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)) to determine the appropriate sample rate for each project.
@@ -156,3 +130,4 @@ For deprioritizing health checks, we compute a new sample rate by dividing the b
If you want to learn more about the architecture behind Dynamic Sampling, continue to the [next page](/dynamic-sampling/architecture/).
+
diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx
index 210bf0089b4e6..b233164a87586 100644
--- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx
+++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx
@@ -96,4 +96,4 @@ When you go into the trace explorer or Discover, you might want to now split the
-If you want to learn more about Dynamic Sampling, continue to the [next page](/dynamic-sampling/fidelity-and-biases/).
+If you want to learn more about Dynamic Sampling, continue to the [next page](/dynamic-sampling/biases/).