Skip to content

STF-322: Bounded transport-failure retry in WebServiceClient#693

Open
oschwald wants to merge 5 commits intomainfrom
greg/stf-322
Open

STF-322: Bounded transport-failure retry in WebServiceClient#693
oschwald wants to merge 5 commits intomainfrom
greg/stf-322

Conversation

@oschwald
Copy link
Copy Markdown
Member

@oschwald oschwald commented May 2, 2026

Summary

  • Adds WebServiceClient.Builder.maxRetries(int) (defaults to 1) so a single transient transport failure (Connection reset, Broken pipe, EOF, closed channel, etc.) is retried transparently.
  • Predicate is permissive by exclusion: any IOException from httpClient.send() is retried EXCEPT HttpTimeoutException (which covers both request-phase and connect-phase timeouts, since HttpConnectTimeoutException extends it) and InterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend; InterruptedIOException is a user-cancellation signal we honor. HTTP 4xx/5xx come back as HttpResponse objects (not IOExceptions) and are surfaced through the existing exception hierarchy, so the predicate is structurally unable to retry them.
  • Restores the interrupt flag in the InterruptedException rewrap path. Updates README and CHANGELOG; opt out via .maxRetries(0).
  • Adds a mise.toml / mise.lock to pin the Java + Maven versions used locally, mirroring minfraud-api-java.

Fixes STF-322. Parallel PR for minfraud-api-java uses a byte-identical predicate (maxmind/minfraud-api-java#619).

Test plan

  • mvn checkstyle:check clean
  • mvn verify -Dgpg.skip=true — 102/102 pass (28 existing + 9 new WebServiceClientTest cases)
  • Reviewer confirms README guidance points customers at jdk.httpclient.keepalive.timeout as the complementary workaround

🤖 Generated with Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient, defaulting to one retry for issues like connection resets or broken pipes. The implementation includes updates to the Builder for maxRetries configuration, detailed documentation in the README, and comprehensive test coverage for various failure scenarios. Feedback suggests refactoring the retry loop to prevent potential integer overflow and adopting case-insensitive string matching for exception messages to improve robustness across different environments.

Comment thread src/main/java/com/maxmind/geoip2/WebServiceClient.java
Comment thread src/main/java/com/maxmind/geoip2/WebServiceClient.java Outdated
@oschwald
Copy link
Copy Markdown
Member Author

oschwald commented May 6, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient. It adds a maxRetries option to the builder (defaulting to 1), implements logic to retry on specific exceptions like connection resets and connect timeouts, and includes comprehensive tests to verify the behavior. Additionally, it adds tool configuration files for mise and updates the documentation and changelog. I have no feedback to provide as the changes are well-implemented and the existing review comments were purely evaluative.

@oschwald
Copy link
Copy Markdown
Member Author

oschwald commented May 6, 2026

@gemini-code-assist review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for transient errors like connection resets or broken pipes. It includes updates to the Builder for configuring maxRetries, expanded documentation in the README and CHANGELOG, and new test cases. Review feedback identifies a discrepancy in the retry logic regarding HttpConnectTimeoutException, which is currently excluded because it inherits from HttpTimeoutException. Additionally, a test case for connection timeouts was found to be misnamed and lacks verification that retries are actually being attempted.

I am having trouble creating individual review comments. Click here to see my feedback.

src/main/java/com/maxmind/geoip2/WebServiceClient.java (446-448)

medium

There is a discrepancy between the code and the PR summary regarding HttpConnectTimeoutException. The summary states that connect timeouts are retried, but since HttpConnectTimeoutException extends HttpTimeoutException, it is currently excluded by this check. Additionally, the summary describes a "narrow" predicate that "walks the cause chain," whereas the implementation is broad, retrying all IOExceptions except for timeouts and interruptions. If the intention is to exclude all timeouts (as suggested by the README and CHANGELOG), the PR summary should be corrected. If connect timeouts should be retried, this logic needs to be adjusted to specifically allow them while still excluding request-phase timeouts.

src/test/java/com/maxmind/geoip2/WebServiceClientTest.java (590-608)

medium

The test testRetriesOnConnectTimeout is misnamed and lacks verification of the retry logic. It triggers a ConnectException (connection refused) by connecting to a closed port, which is distinct from a "Connect Timeout" (HttpConnectTimeoutException). Furthermore, the test does not verify that a retry actually occurred (e.g., by checking the number of attempts). Consider renaming the test to reflect that it tests connection refusal and adding a verification step (such as using a mock or checking WireMock logs if applicable) to ensure the retry mechanism is triggered as expected.

oschwald and others added 5 commits May 6, 2026 20:08
When the JDK HttpClient pool reuses an idle connection that an intermediary
(load balancer, proxy, NAT) has silently closed, the next send() fails with
"Connection reset", "Broken pipe", or related transport errors. A single
retry recovers transparently without exposing this race to callers. The
default JDK keep-alive timeout exceeds many intermediaries' idle timeout,
making this mismatch the common case.

The retry predicate is permissive by exclusion: any IOException from
httpClient.send() is retried EXCEPT HttpTimeoutException (covering both
request-phase and connect-phase timeouts, since HttpConnectTimeoutException
is a subclass) and InterruptedIOException. Both timeouts are customer-set
budgets that retrying would silently extend; InterruptedIOException is a
user-cancellation signal. HTTP 4xx and 5xx responses are surfaced as
HttpException (and subclasses) from a separate code path -- they come back
as HttpResponse objects rather than IOExceptions, so the predicate is
structurally unable to retry them.

Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total
attempts). The interrupt flag is restored before rewrapping
InterruptedException, and a pre-set interrupt short-circuits the predicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing catch (InterruptedException) block in responseFor() rewraps
into GeoIp2Exception without restoring the thread's interrupt status,
silently swallowing the cancellation signal. Per Java's interruption
protocol, code that catches InterruptedException without rethrowing it
should re-set the flag so callers up the stack can observe the
cancellation.

This is an independent bug fix bundled into the STF-322 retry work because
the retry feature exposes the path more often. Per project commit hygiene
it lands as a separate commit so it can be cherry-picked or reverted on
its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover all 9 scenarios: connection-reset retry on country, city, and insights
endpoints, no retry on HttpTimeoutException, retry on connect timeout
(deterministic via a closed local ServerSocket), no retry on 4xx/5xx,
.maxRetries(0) opt-out, and pre-interrupt short-circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors minfraud-api-java's mise.toml and mise.lock so Java and Maven
versions are pinned and auto-installed on directory entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant