feat: add async retry and bearer-auth pipeline steps#157
Merged
Conversation
Add the AsyncHttpStep counterparts for the RETRY and AUTH pillar stages so async calls get the same retry and authentication behaviour as the synchronous pipeline. DefaultAsyncRetryStep mirrors DefaultRetryStep's policy exactly — it reuses HttpRetryOptions, the shared BackoffCalculator, RetryAfterParser, the same Retry-After header set, and the same idempotency-aware re-sendability gating. Backoff delays are scheduled on a ScheduledExecutorService via Futures.delay rather than blocking a thread, and the retry loop is driven by an iterative trampoline (no per-attempt thenCompose recursion), so a long zero-delay retry sequence stays stack-safe. AsyncBearerTokenAuthStep stamps Authorization: Bearer via a new non-blocking BearerTokenProvider.fetchAsync seam. A token that is still valid but inside the refresh margin is returned and stamped immediately while a refresh runs off-thread; concurrent requests that observe an expiring or missing token share a single in-flight fetch (single-flight) so they don't stampede the token endpoint. The HTTPS guard, cross-origin credential suppression, and 401-challenge token eviction match the synchronous BearerTokenAuthStep. A ManualScheduler test fixture drives the scheduled delays deterministically so the retry tests run without real sleeps.
DefaultAsyncRetryStep drove its retry loop through an attempt.whenComplete
callback while completing the caller-facing future by hand inside
onSuccess/onFailure. A throw escaping a whenComplete action is swallowed —
it only completes the discarded callback future — so any exception raised
while deciding to retry, computing the backoff, or logging left the returned
future forever incomplete and hung the caller's join().
The common trigger was self-suppression: when a transport re-throws a single
exception instance across attempts, the terminal step attached that instance
under itself via addSuppressed, throwing IllegalArgumentException
("Self-suppression not permitted").
DefaultAsyncRetryStep:
- Wrap the retry decision, delay computation and logging in onSuccess/onFailure
so any throw completes the future exceptionally and closes an open retryable
response, instead of being swallowed by the completion callback.
- Skip self-references when attaching the prior-attempt trail as suppressed.
- Intercept InterruptedIOException/InterruptedException before classification:
restore the interrupt flag, normalise to InterruptedIOException, and surface
it without retrying, matching the synchronous DefaultRetryStep and the SDK
cancellation convention.
- Guard scheduleNext against a synchronous scheduler rejection.
- Guard the retry trampoline's re-arm state with a lock so the pump-exit is
visible to a drive() resuming on a scheduler thread and the exit-check is
atomic with a concurrent re-arm; the previous plain fields could strand the
future on a real multi-threaded ScheduledExecutorService.
- Close the response on the no-retry success path when the caller has already
completed or cancelled the returned future.
- Swallow any non-fatal close failure (not only IOException) on a discarded
retryable response.
Also simplify AsyncBearerTokenAuthStep.validateFresh to accept a nullable
token directly, dropping a redundant cast and two suppressions.
Tests: FailNTimesClient now throws a distinct exception per attempt; add
regression coverage for a throwing retry/exception predicate, a synchronous
scheduler rejection, interrupt handling, self-suppression with a reused
instance, response close-on-cancel, a throwing response close, and retries
driven on a real multi-threaded scheduler.
The synchronous DefaultRetryStep exposes computeResponseDelay, computeExceptionDelay, and retryAfterFromHeaders as protected open extension points so a subclass can apply request-specific pacing (for example a server-specific Retry-After variant). DefaultAsyncRetryStep carried the same logic but buried it in a private per-call driver, so there was no way to customise delay resolution from a subclass. Hoist the three methods onto DefaultAsyncRetryStep with the same signatures as the synchronous step; the per-call driver now builds the HttpRetryCondition and dispatches through them, so an override is honored. Delay resolution order and values are unchanged. Also correct ManualScheduler.recordedDelays to report run-then-pending so it matches its documented submission order, and add async coverage for retry-after-ms parsing, delayFromCondition precedence, and a subclass delay override.
AsyncBearerTokenAuthStep returns a still-valid token immediately while refreshing it off-thread once it enters the refresh margin. Every request in that window called startBackgroundRefresh, and each attached its own failure-logging callback to the shared single-flight fetch, so a single failed refresh emitted one warning per concurrent request instead of one per fetch. Route the failure log through a new onLaunch hook on sharedFetch that runs only for the caller that actually starts the fetch; joiners and cache hits no longer attach a callback. Single-flight coalescing, cache bookkeeping, and the in-flight handoff are unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the async (
AsyncHttpStep) counterparts for the RETRY and AUTH pillarstages, so calls run through the async pipeline get the same retry and authentication
behaviour the synchronous pipeline already has.
Async retry (
DefaultAsyncRetryStep,AsyncRetryStep)Mirrors
DefaultRetryStep's policy exactly — it reusesHttpRetryOptions, the sharedBackoffCalculator,RetryAfterParser, the sameRetry-Afterheader set, and the sameidempotency-aware re-sendability gating (idempotent method or replayable body; a bare
non-idempotent POST or a non-replayable body is not retried).
ScheduledExecutorServiceviaFutures.delay— noThread.sleep, noTimer. While a delay is pending thedispatching thread is free.
per-attempt
thenComposechain, so a long (even zero-delay) retry sequence does notgrow the call stack or the future continuation graph. A 5,000-attempt test proves this.
CompletionExceptionwrappers are unwrapped before classification;Erroris never retried; prior attempts are attached as suppressed on the terminalexception.
Async bearer auth (
AsyncBearerTokenAuthStep,AsyncAuthStep)Adds a non-blocking
BearerTokenProvider.fetchAsyncseam (default forwards to the blockingfetch; adapter modules override to dispatch off-thread). The step:returned and stamped immediately while the refresh runs off-thread — the in-flight
request never waits on the token endpoint.
one in-flight provider call, so they don't stampede the token endpoint.
401 +
WWW-Authenticate: Bearertoken eviction + single-retry behaviour from thesynchronous
BearerTokenAuthStep.Per-cloud token providers (GCP/Azure/Kubernetes) and OAuth token-exchange specifics are
deliberately kept out of
sdk-core— they belong in adapter modules and overridefetchAsync.Testing
A new
ManualSchedulertest fixture drives the scheduled delays deterministically, so theasync retry tests assert the full backoff/Retry-After schedule with no real sleeps. The
async auth tests use deferred
CompletableFutures to prove the valid-but-expiring token isstamped without blocking and that concurrent fetches coalesce.
Closes #31
Closes #32
Gated build (module-scoped, run locally)
All passed.
apiDumpwas run for the intentional public-API additions and the regeneratedapi/sdk-core.apiis committed.