Skip to content

feat: add async retry and bearer-auth pipeline steps#157

Merged
OmarAlJarrah merged 5 commits into
mainfrom
feat/async-pipeline-steps
Jun 24, 2026
Merged

feat: add async retry and bearer-auth pipeline steps#157
OmarAlJarrah merged 5 commits into
mainfrom
feat/async-pipeline-steps

Conversation

@OmarAlJarrah

Copy link
Copy Markdown
Member

Summary

Adds the async (AsyncHttpStep) counterparts for the RETRY and AUTH pillar
stages, so calls run through the async pipeline get the same retry and authentication
behaviour the synchronous pipeline already has.

Async retry (DefaultAsyncRetryStep, AsyncRetryStep)

Mirrors DefaultRetryStep's policy exactly — it reuses HttpRetryOptions, the shared
BackoffCalculator, RetryAfterParser, the same Retry-After header set, and the same
idempotency-aware re-sendability gating (idempotent method or replayable body; a bare
non-idempotent POST or a non-replayable body is not retried).

  • Non-blocking delays: backoff is scheduled on a ScheduledExecutorService via
    Futures.delay — no Thread.sleep, no Timer. While a delay is pending the
    dispatching thread is free.
  • Stack-safe loop: the retry loop is driven by an iterative trampoline rather than a
    per-attempt thenCompose chain, so a long (even zero-delay) retry sequence does not
    grow the call stack or the future continuation graph. A 5,000-attempt test proves this.
  • Failure handling: CompletionException wrappers are unwrapped before classification;
    Error is never retried; prior attempts are attached as suppressed on the terminal
    exception.

Async bearer auth (AsyncBearerTokenAuthStep, AsyncAuthStep)

Adds a non-blocking BearerTokenProvider.fetchAsync seam (default forwards to the blocking
fetch; adapter modules override to dispatch off-thread). The step:

  • Background refresh: a token that is still valid but inside the refresh margin is
    returned and stamped immediately while the refresh runs off-thread — the in-flight
    request never waits on the token endpoint.
  • Single-flight: concurrent requests that all see an expiring or missing token share
    one in-flight provider call, so they don't stampede the token endpoint.
  • Keeps the HTTPS-only credential guard, cross-origin credential suppression, and
    401 + WWW-Authenticate: Bearer token eviction + single-retry behaviour from the
    synchronous BearerTokenAuthStep.

Per-cloud token providers (GCP/Azure/Kubernetes) and OAuth token-exchange specifics are
deliberately kept out of sdk-core — they belong in adapter modules and override
fetchAsync.

Testing

A new ManualScheduler test fixture drives the scheduled delays deterministically, so the
async retry tests assert the full backoff/Retry-After schedule with no real sleeps. The
async auth tests use deferred CompletableFutures to prove the valid-but-expiring token is
stamped without blocking and that concurrent fetches coalesce.

Closes #31
Closes #32

Gated build (module-scoped, run locally)

./gradlew :sdk-core:apiDump --no-daemon                # regenerated api/sdk-core.api committed
./gradlew :sdk-core:test --no-daemon                   # full suite green
./gradlew :sdk-core:ktlintCheck :sdk-core:detekt :sdk-core:apiCheck --no-daemon   # green

All passed. apiDump was run for the intentional public-API additions and the regenerated
api/sdk-core.api is committed.

Add the AsyncHttpStep counterparts for the RETRY and AUTH pillar stages so
async calls get the same retry and authentication behaviour as the synchronous
pipeline.

DefaultAsyncRetryStep mirrors DefaultRetryStep's policy exactly — it reuses
HttpRetryOptions, the shared BackoffCalculator, RetryAfterParser, the same
Retry-After header set, and the same idempotency-aware re-sendability gating.
Backoff delays are scheduled on a ScheduledExecutorService via Futures.delay
rather than blocking a thread, and the retry loop is driven by an iterative
trampoline (no per-attempt thenCompose recursion), so a long zero-delay retry
sequence stays stack-safe.

AsyncBearerTokenAuthStep stamps Authorization: Bearer via a new non-blocking
BearerTokenProvider.fetchAsync seam. A token that is still valid but inside the
refresh margin is returned and stamped immediately while a refresh runs
off-thread; concurrent requests that observe an expiring or missing token share
a single in-flight fetch (single-flight) so they don't stampede the token
endpoint. The HTTPS guard, cross-origin credential suppression, and
401-challenge token eviction match the synchronous BearerTokenAuthStep.

A ManualScheduler test fixture drives the scheduled delays deterministically so
the retry tests run without real sleeps.
DefaultAsyncRetryStep drove its retry loop through an attempt.whenComplete
callback while completing the caller-facing future by hand inside
onSuccess/onFailure. A throw escaping a whenComplete action is swallowed —
it only completes the discarded callback future — so any exception raised
while deciding to retry, computing the backoff, or logging left the returned
future forever incomplete and hung the caller's join().

The common trigger was self-suppression: when a transport re-throws a single
exception instance across attempts, the terminal step attached that instance
under itself via addSuppressed, throwing IllegalArgumentException
("Self-suppression not permitted").

DefaultAsyncRetryStep:
- Wrap the retry decision, delay computation and logging in onSuccess/onFailure
  so any throw completes the future exceptionally and closes an open retryable
  response, instead of being swallowed by the completion callback.
- Skip self-references when attaching the prior-attempt trail as suppressed.
- Intercept InterruptedIOException/InterruptedException before classification:
  restore the interrupt flag, normalise to InterruptedIOException, and surface
  it without retrying, matching the synchronous DefaultRetryStep and the SDK
  cancellation convention.
- Guard scheduleNext against a synchronous scheduler rejection.
- Guard the retry trampoline's re-arm state with a lock so the pump-exit is
  visible to a drive() resuming on a scheduler thread and the exit-check is
  atomic with a concurrent re-arm; the previous plain fields could strand the
  future on a real multi-threaded ScheduledExecutorService.
- Close the response on the no-retry success path when the caller has already
  completed or cancelled the returned future.
- Swallow any non-fatal close failure (not only IOException) on a discarded
  retryable response.

Also simplify AsyncBearerTokenAuthStep.validateFresh to accept a nullable
token directly, dropping a redundant cast and two suppressions.

Tests: FailNTimesClient now throws a distinct exception per attempt; add
regression coverage for a throwing retry/exception predicate, a synchronous
scheduler rejection, interrupt handling, self-suppression with a reused
instance, response close-on-cancel, a throwing response close, and retries
driven on a real multi-threaded scheduler.
The synchronous DefaultRetryStep exposes computeResponseDelay,
computeExceptionDelay, and retryAfterFromHeaders as protected open
extension points so a subclass can apply request-specific pacing (for
example a server-specific Retry-After variant). DefaultAsyncRetryStep
carried the same logic but buried it in a private per-call driver, so
there was no way to customise delay resolution from a subclass.

Hoist the three methods onto DefaultAsyncRetryStep with the same
signatures as the synchronous step; the per-call driver now builds the
HttpRetryCondition and dispatches through them, so an override is
honored. Delay resolution order and values are unchanged.

Also correct ManualScheduler.recordedDelays to report run-then-pending
so it matches its documented submission order, and add async coverage
for retry-after-ms parsing, delayFromCondition precedence, and a
subclass delay override.
AsyncBearerTokenAuthStep returns a still-valid token immediately while
refreshing it off-thread once it enters the refresh margin. Every
request in that window called startBackgroundRefresh, and each attached
its own failure-logging callback to the shared single-flight fetch, so a
single failed refresh emitted one warning per concurrent request instead
of one per fetch.

Route the failure log through a new onLaunch hook on sharedFetch that
runs only for the caller that actually starts the fetch; joiners and
cache hits no longer attach a callback. Single-flight coalescing, cache
bookkeeping, and the in-flight handoff are unchanged.
@OmarAlJarrah OmarAlJarrah merged commit b8a26f6 into main Jun 24, 2026
1 check passed
@OmarAlJarrah OmarAlJarrah deleted the feat/async-pipeline-steps branch June 24, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an async bearer-auth step with background token refresh Add an async retry step at the RETRY stage

1 participant