chore(dynamic-sampling): rename dynamic sampling status enum#115360
chore(dynamic-sampling): rename dynamic sampling status enum#115360shellmayr wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit caabae7. Configure here.
| except SnubaRPCError as exc: | ||
| sentry_sdk.capture_exception(exc) | ||
| emit_status(status_metric, DynamicSamplingStatus.SNUBA_TIMEOUT) | ||
| raise |
There was a problem hiding this comment.
Misleading metric label for non-timeout Snuba errors
Medium Severity
The new except SnubaRPCError handler catches all subclasses of SnubaRPCError, including SnubaRPCRateLimitExceeded and SnubaRPCTooManySimultaneous, but labels all of them with DynamicSamplingStatus.SNUBA_TIMEOUT ("snuba_timeout"). Rate limit errors and concurrent query limit errors are not timeouts. This produces misleading metrics that could cause incorrect conclusions during incident investigation.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit caabae7. Configure here.
| except SnubaRPCError as exc: | ||
| sentry_sdk.capture_exception(exc) | ||
| emit_status(status_metric, DynamicSamplingStatus.SNUBA_TIMEOUT) | ||
| raise |
There was a problem hiding this comment.
Bug: The except SnubaRPCError block is too broad and incorrectly classifies various Snuba errors, like rate limits, as SNUBA_TIMEOUT.
Severity: MEDIUM
Suggested Fix
Refine the exception handling to be more specific. Catch urllib3.exceptions.ReadTimeoutError or a more specific timeout-related exception first and handle it by emitting DynamicSamplingStatus.SNUBA_TIMEOUT. Then, either re-raise other SnubaRPCError types or handle them separately with different metrics to ensure accurate error reporting.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/dynamic_sampling/per_org/tasks/telemetry.py#L117-L120
Potential issue: The `except SnubaRPCError` handler is intended to catch Snuba timeouts
but is overly broad. It will catch all `SnubaRPCError` subclasses, including
`SnubaRPCRateLimitExceeded` and `SnubaRPCTooManySimultaneous`. As a result, various
distinct Snuba issues such as rate limits, concurrent query limits, and other
server-side errors will be incorrectly reported with the
`DynamicSamplingStatus.SNUBA_TIMEOUT` status. This misclassification will lead to
inaccurate metrics, making it difficult to distinguish actual timeouts from other
critical Snuba failures in production monitoring.
Did we get this right? 👍 / 👎 to inform future reviews.


This name makes more sense - it's the status of the dynamic sampling job, so let's not make this more complicated than necessary