feat(taskbroker): Implement retry support for raw topics#630
Draft
untitaker wants to merge 2 commits into
Draft
Conversation
Add retry support for raw/passthrough topics (e.g. `ingest-events`) where tasks don't have retry_state embedded in the message. Changes: - Config: Add `kafka_retry_topic` option for dedicated retry topic - Store: Add `update_retry_state` method to update activation's retry_state - gRPC: Handle `max_retries` in SetTaskStatus, call store.update_retry_state - Upkeep: Route retries to dedicated retry topic when configured - Consumer: Subscribe to both main and retry topics - Deserializer: Topic-aware routing (retry topic always uses activation deserializer) - Python client: Extract max_retries from Retry config, send in SetTaskStatusRequest When a worker reports RETRY status with max_retries, the broker updates the activation's retry_state and routes the retry to the dedicated retry topic. This prevents retries from polluting the main topic where other consumers (like SBC) can't parse activations. See https://www.notion.so/3448b10e4b5d80e7a1efee6145d504c2 → Stage 4 Depends on: getsentry/sentry-protos#251 ref STREAM-981 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // This allows workers to communicate retry policy for tasks from raw topics. | ||
| if status == InflightActivationStatus::Retry { | ||
| if let Some(max_retries) = request.get_ref().max_retries { | ||
| if let Err(e) = self.store.update_retry_state(&id, max_retries).await { |
Member
There was a problem hiding this comment.
How often will this happen? If it happens on every request, throughput will drop significantly. If it happens only occasionally, it should be fine.
Member
Author
There was a problem hiding this comment.
on every retry, we need to update the task activation. but i think i'll redesign this so that the task activation is updated in one go (max_retries and retry_count are updated using the same update stmt)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement two independent features in this PR:
kafka_retry_topicso that one can send activations from retried tasks into a separate topic. That topic never contains raw messages.max_retriesfrom the worker. Previously the producer would configuremax_retriesand send them as part of the activation, but that cannot work with raw topics, so an architecture change is needed.See Architecture doc → Stage 4 for full context.
Dependencies
Depends on: getsentry/sentry-protos#251 (adds
max_retriesfield toSetTaskStatusRequest)ref STREAM-981