Improve HTTP response handling and retry logic#520
Improve HTTP response handling and retry logic#520MichaelGHSeg wants to merge 3 commits intomasterfrom
Conversation
- Add Authorization header with Basic auth (base64 encoded write key) - Add X-Retry-Count header on all requests (starts at 0) - Implement Retry-After header support (capped at 300s) - Retry-After attempts don't count against backoff retry budget - Add granular status code classification: - Retryable 4xx: 408, 410, 429, 460 - Non-retryable 4xx: 400, 401, 403, 404, 413, 422 - Retryable 5xx: all except 501, 505 - Non-retryable 5xx: 501, 505 - Replace backoff decorator with custom retry loop - Exponential backoff with jitter (0.5s base, 60s cap) - Clear OAuth token on 511 Network Authentication Required - 413 Payload Too Large is non-retryable - Add 30 new comprehensive tests (106 total tests) Aligns with analytics-java and analytics-next retry behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns with analytics-java change to accommodate shorter backoff periods (0.5s base, 60s cap). With faster retries, a higher retry limit allows for better resilience during extended outages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves HTTP response handling and retry logic in the analytics-python client, aligning with changes from analytics-java and analytics-next. The implementation replaces the backoff decorator with a custom retry loop that provides fine-grained control over retry behavior, distinguishing between Retry-After attempts and exponential backoff attempts.
Changes:
- Replaced backoff decorator with custom retry loop implementing exponential backoff with jitter (0.5s base, 60s cap)
- Added Authorization header support (Basic auth with write key and OAuth Bearer token)
- Added X-Retry-Count header to track attempt numbers
- Implemented Retry-After header support (capped at 300s, doesn't count against retry budget)
- Granular status code classification distinguishing retryable from non-retryable errors
- Increased default max_retries from 10 to 1000 to accommodate faster retry cadence
- OAuth token clearing expanded to include 511 status code
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| segment/analytics/request.py | Added Authorization header (Basic/Bearer), X-Retry-Count header, parse_retry_after() function, and response object to APIError |
| segment/analytics/consumer.py | Replaced backoff decorator with custom retry loop implementing exponential backoff, Retry-After support, and granular status code classification |
| segment/analytics/client.py | Increased default max_retries from 10 to 1000 |
| segment/analytics/test/test_request.py | Added 17 comprehensive tests for authorization headers, X-Retry-Count, Retry-After parsing, and OAuth token clearing |
| segment/analytics/test/test_consumer.py | Added 13 comprehensive tests for retry logic, status code classification, backoff behavior, and Retry-After support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import json | ||
| import base64 | ||
| from dateutil.tz import tzutc | ||
| from requests.auth import HTTPBasicAuth |
There was a problem hiding this comment.
The HTTPBasicAuth import from requests.auth is no longer used since Basic authentication is now manually implemented using base64 encoding. Consider removing this unused import.
| from requests.auth import HTTPBasicAuth |
| try: | ||
| # Try parsing as integer (delay in seconds) | ||
| delay = int(retry_after) | ||
| return min(delay, MAX_RETRY_AFTER_SECONDS) |
There was a problem hiding this comment.
The parse_retry_after function doesn't validate that the parsed delay is non-negative. If a server sends a negative Retry-After value (e.g., "Retry-After: -5"), this will cause time.sleep() to raise a ValueError. Consider adding validation to ensure the delay is non-negative: return min(max(delay, 0), MAX_RETRY_AFTER_SECONDS) or return None for negative values.
| return min(delay, MAX_RETRY_AFTER_SECONDS) | |
| # Ensure delay is non-negative before applying upper bound | |
| return min(max(delay, 0), MAX_RETRY_AFTER_SECONDS) |
| # Calculate exponential backoff delay with jitter | ||
| base_delay = 0.5 * (2 ** (backoff_attempts - 1)) | ||
| jitter = random.uniform(0, 0.1 * base_delay) | ||
| delay = min(base_delay + jitter, 60) # Cap at 60 seconds | ||
|
|
||
| self.log.debug( | ||
| f"Retry attempt {backoff_attempts}/{self.retries} (total attempts: {total_attempts}) " | ||
| f"after {delay:.2f}s for status {e.status}" | ||
| ) | ||
| time.sleep(delay) | ||
|
|
||
| except Exception as e: | ||
| # Network errors or other exceptions - retry with backoff | ||
| total_attempts += 1 | ||
| backoff_attempts += 1 | ||
|
|
||
| if backoff_attempts >= max_backoff_attempts: | ||
| self.log.error( | ||
| f"All {self.retries} retries exhausted after {total_attempts} total attempts. Final error: {e}" | ||
| ) | ||
| raise | ||
|
|
||
| # Calculate exponential backoff delay with jitter | ||
| base_delay = 0.5 * (2 ** (backoff_attempts - 1)) | ||
| jitter = random.uniform(0, 0.1 * base_delay) | ||
| delay = min(base_delay + jitter, 60) # Cap at 60 seconds |
There was a problem hiding this comment.
The exponential backoff calculation is duplicated between the APIError handler (lines 196-199) and the generic Exception handler (lines 218-221). Consider extracting this logic into a helper function to reduce code duplication and improve maintainability.
| if should_use_retry_after(e.status) and e.response: | ||
| retry_after = parse_retry_after(e.response) | ||
| if retry_after: | ||
| self.log.debug( | ||
| f"Retry-After header present: waiting {retry_after}s (attempt {total_attempts})" | ||
| ) | ||
| time.sleep(retry_after) | ||
| continue # Does not count against backoff budget |
There was a problem hiding this comment.
The retry logic does not impose an upper bound on Retry-After attempts. If the server continuously responds with Retry-After headers (for status codes 408, 429, or 503), the client could retry indefinitely. Consider adding a maximum total attempt limit to prevent infinite retry loops, even when Retry-After is present. For example, you could add a check like if total_attempts >= some_large_limit: raise before processing Retry-After.
| try: | ||
| consumer.request([track]) | ||
| except FatalError: | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| try: | |
| consumer.request([track]) | |
| except FatalError: | |
| pass | |
| with self.assertRaises(FatalError): | |
| consumer.request([track]) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Implements improved HTTP response handling and retry logic to align with changes in analytics-java and analytics-next:
Authorizationheader (Basic auth with write key, OAuth Bearer token)X-Retry-Countheader on all requests (starts at 0)Retry-Afterheader support (capped at 300s, doesn't count against retry budget)backoffdecorator with custom retry loop for better controlStatus Code Handling
Retryable 4xx: 408, 410, 429, 460
Non-retryable 4xx: 400, 401, 403, 404, 413, 422, and all other 4xx
Retryable 5xx: All except 501, 505
Non-retryable 5xx: 501, 505
Retry-After Behavior
Test Coverage
Changes
Modified Files
segment/analytics/consumer.py: Custom retry loop, status code classification, Retry-After supportsegment/analytics/request.py: Authorization header, X-Retry-Count header, parse_retry_after()segment/analytics/client.py: Increased default max_retries to 1000segment/analytics/test/test_consumer.py: 13 new test casessegment/analytics/test/test_request.py: 17 new test casesAlignment
This implementation aligns with:
🤖 Generated with Claude Code