Fix ConnectionErrorRetryTimeout to act as a hard deadline during connection attempts#711
Closed
rhysparry wants to merge 2 commits into
Closed
Conversation
…ection attempts Previously the timeout was only checked at the start of each retry loop iteration, so a blocked SSL handshake could exceed it by up to one full per-attempt timeout. Now a linked CancellationToken fires at the deadline, interrupting AcquireConnectionAsync mid-attempt. Deadline cancellations are surfaced as HalibutClientException rather than OperationCanceledException. Includes a regression test using a paused port forwarder to demonstrate the before/after: the overall wait is now bounded by ConnectionErrorRetryTimeout (~3s) rather than the per-attempt stream timeout (~8s). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… deadline AuthenticateAsClientAsync on .NET 4.8 does not accept a CancellationToken, so the ConnectionErrorRetryTimeout deadline had no effect during the TLS handshake — the attempt ran until the socket ReceiveTimeout instead. Register a callback on the cancellation token that closes the underlying TcpClient, which unblocks AuthenticateAsClientAsync immediately. The resulting exception is re-thrown as OperationCanceledException so the existing handling in SecureListeningClient routes it correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
We want to avoid this behavioral change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConnectionErrorRetryTimeoutwas only checked at the start of each retry loop iteration inSecureListeningClient, so a single blocked connection attempt (slow TCP connect or frozen SSL handshake) could exceed the deadline by up to one full per-attempt timeout before the loop could exit.CancellationTokenSourcearmed atConnectionErrorRetryTimeoutwhose token is passed intoAcquireConnectionAsync, propagating through bothConnectWithTimeoutAsyncandAuthenticateAsClientEnforcingTimeout. This makes the deadline a true hard ceiling on each individual attempt.OperationCanceledExceptionis converted toTimeoutExceptionin the catch block soHandleErrorsurfaces it asHalibutClientExceptionrather than re-throwing asOperationCanceledException.ConnectionErrorRetryTimeout=3sand per-attempt stream timeout of8s, the test verifies the call fails in under6s.Implications for existing users
Production defaults (
ConnectionErrorRetryTimeout=5min,TcpClientConnectTimeout=60s,TcpClientTimeout=10min): no practical change. Retries complete (or exhaustRetryCountLimit) long before the 5-minute deadline fires.Short
ConnectionErrorRetryTimeoutwith long per-attempt timeouts: behaviour visibly changes. Previously a single stuck TCP connect or SSL handshake could outlast the deadline by up toTcpClientConnectTimeoutorTcpClientTimeout.receive. Now the attempt is interrupted at the deadline. This is the intended semantics of the setting.Exception type: the deadline case previously accidentally produced
OperationCanceledException(viaHandleError's switch case). It now producesHalibutClientException, which is the correct type for RPC timeout errors in Halibut.Background TCP socket: when the deadline fires inside
ConnectWithTimeoutAsync, the underlyingConnectAsynctask is orphaned and the socket is force-closed viaDisposeClient()— the same behaviour as whenTcpClientConnectTimeoutfires normally.net48 SSL handshake:
AuthenticateAsClientAsyncon .NET Framework ignores cancellation tokens (noted inTcpConnectionFactory.cswith a// TODO: ASYNC ME UP!comment). The deadline CTS can still interrupt the TCP connect phase, but has no effect inside the SSL handshake on net48; the stream'sReceiveTimeoutstill controls that phase.Test plan
ConnectionErrorRetryTimeout_IsAHardDeadlineEvenWhenIndividualAttemptsBlockLongerconfirmed failing before fix (~10s, wrong exception type) and passing after fix (~3.7s,HalibutClientException)ListeningConnectRetryFixturetests pass (including existing timeout and retry-count tests)🤖 Generated with Claude Code