feat(transport): add HTTP retry with exponential backoff by jpnurmi · Pull Request #1520 · getsentry/sentry-native

jpnurmi · 2026-02-13T15:44:54Z

Add HTTP retry with exponential backoff for network failures, modeled after Crashpad's upload retry behavior.

Failed envelopes are stored as <db>/cache/<ts>-<n>-<uuid>.envelope and retried on startup after a 100ms throttle, and then with exponential backoff (15min, 30min, 1h, 2h, 8h). When retries are exhausted, and offline caching is enabled, envelopes are stored as <db>/cache/<uuid>.envelope instead of being discarded.

flowchart TD
    startup --> R{retry?}
    R -->|yes| throttle
    R -->|no| C{cache?}
    throttle -. 100ms .-> resend
    resend -->|success| C
    resend -->|fail| C2[&lt;db&gt;/cache/<br/>&lt;ts&gt;-&lt;n&gt;-&lt;uuid&gt;.envelope]
    C2 --> backoff
    backoff -. 2ⁿ×15min .-> resend
    C -->|yes| CACHE[&lt;db&gt;/cache/<br/>&lt;uuid&gt;.envelope]
    C -->|no| discard

See also: https://develop.sentry.dev/sdk/expected-features/#buffer-to-disk

Depends on:

See also:

Support Offline Caching of Envelopes #1316

github-actions · 2026-02-13T15:45:22Z

	Messages
📖	Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against fbbffb2

jpnurmi · 2026-02-16T08:59:33Z

@sentry review

jpnurmi · 2026-02-16T08:59:39Z

@cursor review

src/sentry_sync.c

src/transports/sentry_http_transport.c

src/sentry_retry.c

src/transports/sentry_http_transport_winhttp.c

src/sentry_retry.c

jpnurmi · 2026-02-16T15:21:30Z

@sentry review

jpnurmi · 2026-02-16T15:21:38Z

@cursor review

src/transports/sentry_http_transport.c

src/sentry_retry.c

jpnurmi · 2026-02-16T16:09:06Z

@cursor review

src/sentry_sync.c

src/sentry_database.c

jpnurmi · 2026-02-16T16:41:48Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

jpnurmi · 2026-02-16T19:42:40Z

@cursor review

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-16T20:06:33Z

src/sentry_retry.c

+sentry__retry_trigger(sentry_retry_t *retry)
+{
+    sentry__bgworker_submit(retry->bgworker, retry_trigger_task, NULL, retry);
+}


Trigger bypasses backoff and rapidly exhausts retry attempts

Medium Severity

retry_trigger_task passes UINT64_MAX as before, bypassing all backoff checks. On network failure, it immediately re-submits itself via sentry__retry_trigger. Since sentry__retry_send breaks after the first failure and bumps that file's count, and the trigger immediately re-queues, all retry attempts for every pending envelope can be exhausted in rapid succession (milliseconds) rather than over the intended hours of exponential backoff. A single mistimed call to sentry_transport_retry while offline permanently burns through all retries, causing data loss.

cursor · 2026-02-16T20:06:33Z

src/sentry_transport.c

-#endif

-#ifdef SENTRY_TRANSPORT_COMPRESSION
-static bool


can_retry returns true even when retry disabled

Low Severity

sentry__transport_can_retry checks for the presence of retry_func, which is unconditionally set for all HTTP transports in sentry__http_transport_new. This means it returns true even when http_retry is false. In sentry_core.c, this causes sentry__cleanup_cache to run unnecessarily for every HTTP transport regardless of the retry option. While currently harmless (cleanup is a no-op with default cache limits), the function's semantics are misleading and could cause subtle issues if future code relies on it to indicate actual retry capability.

Additional Locations (2)

src/transports/sentry_http_transport.c#L396-L397

src/sentry_core.c#L292-L296

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The deferred startup retry scan (100ms delay) could pick up files written by the current session. Filter by startup_time so only previous-session files are processed. Also ensure the cache directory exists when cache_keep is enabled, since sentry__process_old_runs only creates it conditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Monotonic time is process-relative and doesn't work across restarts. Retry envelope timestamps need to persist across sessions, so use time() (seconds since epoch) for file timestamps, startup_time, and backoff comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename SENTRY_RETRY_BACKOFF_BASE_MS to SENTRY_RETRY_BACKOFF_BASE_S and sentry__retry_backoff_ms to sentry__retry_backoff, since file timestamps are now in seconds. The bgworker delay sites multiply by 1000 to convert to the milliseconds it expects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make sentry__retry_flush block until the flush task completes by adding a bgworker_flush call, and subtract the elapsed time from the shutdown timeout. This ensures retries are actually sent before the worker stops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Break out of the send loop on the first network error to avoid wasting time on a dead connection. Remaining envelopes stay untouched for the next retry poll. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When bgworker shutdown times out, persist any remaining queued envelopes to the retry directory so they are not lost. The retry module provides sentry__retry_dump_queue to keep retry internals out of the transport. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After shutdown timeout, the bgworker thread is detached but may still be executing an http_send_task. Since dump_queue already saves that task's envelope to the retry dir, the worker's subsequent call to retry_enqueue would create a duplicate file. Seal the retry module after dumping so that any late enqueue calls are silently skipped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… logic Remove count_eligible_files helper that duplicated filtering logic from sentry__retry_send. The retry_backoff test now exercises the actual send path for both backoff and startup modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Store parsed fields (ts, count, uuid) alongside the path during the filter phase so handle_result and future debug logging can use them without re-parsing. Also improves sort performance by comparing numeric fields before falling back to string comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Log retry attempts at DEBUG level and max-retries-reached at WARN level to make retry behavior observable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…writes Three places independently constructed <database>/cache and wrote envelopes there. Add cache_path to sentry_run_t and introduce sentry__run_write_cache() and sentry__run_move_cache() to centralize the cache directory creation and file operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CURLOPT_TIMEOUT_MS is a total transfer timeout that could cut off large envelopes. Use CURLOPT_CONNECTTIMEOUT_MS instead so only connection establishment is bounded. For winhttp, limit resolve and connect to 15s but leave send/receive at their defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Without this, sentry__retry_send overcounts remaining files, causing an unnecessary extra poll cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure handle_result so "max retries reached" warnings only fire on actual network failures, not on successful delivery at the last attempt. Separate the warning logic from the cache/discard actions and put the re-enqueue branch first for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the `can_retry` bool on the transport with a `retry_func` callback, and expose `sentry_transport_retry()` as an experimental public API for explicitly retrying all pending envelopes, e.g. when coming back online. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move retry envelopes from a separate retry/ directory into cache/ so that sentry__cleanup_cache() enforces disk limits for both file formats out of the box. The two formats are distinguishable by length: retry files use <ts>-<count>-<uuid>.envelope (49+ chars) while cache files use <uuid>.envelope (45 chars). Default http_retries to 0 (opt-in). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When bgworker is detached during shutdown timeout, retry_poll_task can access retry->run->cache_path after sentry_options_free frees the run. Clone the path so it outlives options and is freed with the bgworker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The bgworker_flush in sentry__retry_flush would delay its flush_task by min(delayed_task_time, timeout) when a 15-minute delayed retry_poll_task existed. This consumed the entire shutdown timeout, leaving 0ms for bgworker_shutdown, which then detached the worker thread. On Windows, winhttp_client_shutdown would close handles still in use by the detached thread, causing a crash. The flush is unnecessary because retry_flush_task is an immediate task and bgworker_shutdown already processes all immediate tasks before the shutdown_task runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous commit removed bgworker_flush from retry_flush, which caused a race between WinHTTP connect timeout (~2s) and bgworker shutdown (2s). Restore the flush and pass the full timeout to both flush and shutdown — after flush drains in-flight work, shutdown completes near-instantly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make retry count an internal constant (SENTRY_RETRY_ATTEMPTS = 5) and expose only a boolean toggle. Enabled by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

0 means infinite, not default. Pass 30000ms to match WinHTTP defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use a 'scheduled' flag with atomic compare-and-swap to ensure at most one retry_poll_task is queued at a time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move `sealed = 1` before `foreach_matching` in `retry_dump_queue` to prevent the detached worker from writing duplicate envelopes via `retry_enqueue` while the main thread is dumping the queue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Drop the delayed retry_poll_task before bgworker_flush to prevent it from delaying the flush_task by min(retry_interval, timeout). Subtract elapsed flush time from the shutdown timeout so the total is bounded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When the bgworker is detached after shutdown timeout, retry_dump_queue writes retry files and sets sealed=1. The detached thread could then run retry_flush_task and re-send those files, causing duplicates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The retry system writes cache files directly via its own paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from b083a57 to a264f66 Compare February 13, 2026 17:47

sentry bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_sync.c Show resolved Hide resolved

src/transports/sentry_http_transport.c Show resolved Hide resolved

src/sentry_retry.c Outdated Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/transports/sentry_http_transport_winhttp.c Outdated Show resolved Hide resolved

src/sentry_retry.c Show resolved Hide resolved

This was referenced Feb 16, 2026

ref: parse event_id from raw envelopes #1512

Open

feat(sync): add delayed task submission for throttling #1506

Draft

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/transports/sentry_http_transport.c Show resolved Hide resolved

src/sentry_retry.c Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_sync.c Outdated Show resolved Hide resolved

src/sentry_database.c Outdated Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from 243c880 to d6aa792 Compare February 16, 2026 19:41

cursor bot reviewed Feb 16, 2026

View reviewed changes

jpnurmi and others added 10 commits February 16, 2026 21:16

feat(sync): add delayed task submission for throttling (#1506)

82a0ec9

ref: parse event_id from raw envelopes (#1512)

616b4d2

ref(transport): unify and restructure HTTP transport layer (#1514)

d1ee548

feat(transport): add HTTP retry logic with exponential backoff

b99999d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): replace scan+free_paths with foreach callback API

b793232

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(transport): set 15s request timeout for curl and winhttp

ea10a5f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): avoid duplicate delayed retry tasks on startup

20aba2f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi and others added 29 commits February 16, 2026 21:16

fix(retry): stop retrying on network failure

d6dbd67

Break out of the send loop on the first network error to avoid wasting time on a dead connection. Remaining envelopes stay untouched for the next retry poll. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test(retry): update expectations for stop-on-failure behavior

51346b7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style(retry): fix line length in unit tests

c10121f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add changelog entry for HTTP retry feature

d06e6b6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): raise backoff cap from 2h to 8h to match crashpad

645c550

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(retry): add debug and warning output for HTTP retries

27517ef

Log retry attempts at DEBUG level and max-retries-reached at WARN level to make retry behavior observable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): decrement total count when removing corrupt envelope files

1151ac4

Without this, sentry__retry_send overcounts remaining files, causing an unnecessary extra poll cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(retry): add doc comments to sentry_retry.h declarations

d0d5a78

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(retry): replace http_retries count with boolean http_retry

238219b

Make retry count an internal constant (SENTRY_RETRY_ATTEMPTS = 5) and expose only a boolean toggle. Enabled by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(transport): use explicit WinHTTP send/receive timeouts

46a3637

0 means infinite, not default. Pass 30000ms to match WinHTTP defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): deduplicate poll tasks on concurrent envelope failures

f81fd6b

Use a 'scheduled' flag with atomic compare-and-swap to ensure at most one retry_poll_task is queued at a time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): zero-initialize retry struct after malloc

43e912c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(database): remove unused sentry__run_write_cache

fbbffb2

The retry system writes cache files directly via its own paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi force-pushed the jpnurmi/feat/http-retry branch from d6aa792 to fbbffb2 Compare February 16, 2026 20:16

Uh oh!

Conversation

jpnurmi commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

jpnurmi commented Feb 16, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 16, 2026

Choose a reason for hiding this comment

Trigger bypasses backoff and rapidly exhausts retry attempts

Uh oh!

cursor bot Feb 16, 2026

Choose a reason for hiding this comment

can_retry returns true even when retry disabled

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jpnurmi commented Feb 13, 2026 •

edited

Loading

github-actions bot commented Feb 13, 2026 •

edited

Loading