Skip to content

feat: Introduce tiered timeout system with per-endpoint configuration#653

Draft
vdusek wants to merge 3 commits intomasterfrom
worktree-fix-timeouts
Draft

feat: Introduce tiered timeout system with per-endpoint configuration#653
vdusek wants to merge 3 commits intomasterfrom
worktree-fix-timeouts

Conversation

@vdusek
Copy link
Contributor

@vdusek vdusek commented Mar 2, 2026

Summary

  • Introduce three (four with max) configurable timeout tiers:
    • short (5s by default),
    • medium (30s by default),
    • long (300s by default),
    • and max (600s cap by default).
  • Introduce Timeout type alias: timedelta | Literal['no_timeout', 'short', 'medium', 'long']
  • Keep the exponential timeout growth per retry attempt (doubles each retry, capped at timeout_max).
  • Assign appropriate default timeout tiers to each resource client method (short for CRUD, medium for list/create, long for heavy ops).
  • Reduce default max retries from 8 to 4 (5 attempts by default).
  • Rename _internal_models.py to _types.py and consolidate type aliases (Timeout, JsonSerializable) into it.
  • Remove old timeout constants (FAST_OPERATION_TIMEOUT, STANDARD_OPERATION_TIMEOUT).

Timeout tiers

Tier Default Usage
short 5s Fast CRUD on single resources (get, update, delete)
medium 30s List, create, get-or-create, data transfer (push_items, get_items, set_record, add_request, batch operations), actor start/call/abort, run resurrect/reboot/metamorph
long 300s Data transfer and heavy operations — dataset get_items/push_items/iterate_items, KVS get_record/set_record/get_record_as_bytes/stream_record, request queue batch_add_requests / batch_delete_requests, log get / get_as_bytes / stream, actor build, run get_dataset_items
no_timeout It is not used by default
max 600s Cap for exponential timeout growth across retries

Let's discuss the best defaults.

What this solves

  • Before this change, timeout handling across the client was fragmented and inconsistent in three major ways:
    • Some resource clients had hardcoded internal timeouts — most had none at all.
    • Users had no way to control per-request timeouts.
    • The timeout parameter name was ambiguous on Actor/Task run methods.

Test plan

  • CI passes

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Mar 2, 2026
@vdusek vdusek self-assigned this Mar 2, 2026
@github-actions github-actions bot added this to the 135th sprint - Tooling team milestone Mar 2, 2026
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.67%. Comparing base (841d760) to head (0ad2c46).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #653      +/-   ##
==========================================
+ Coverage   96.57%   96.67%   +0.10%     
==========================================
  Files          45       45              
  Lines        4318     4336      +18     
==========================================
+ Hits         4170     4192      +22     
+ Misses        148      144       -4     
Flag Coverage Δ
integration 94.74% <99.34%> (+0.02%) ⬆️
unit 78.27% <69.05%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek vdusek force-pushed the worktree-fix-timeouts branch from 9653d95 to 634577c Compare March 2, 2026 13:05
- Add `Timeout` type alias (`timedelta | Literal['no_timeout'] | None`)
  to `_consts.py` and export it from the public API
- Accept `None` on abstract `HttpClient.call()` / `HttpClientAsync.call()`
  so the HTTP client resolves its own default timeout internally
- Move `_calculate_timeout` into the impit HTTP client implementation
  since exponential timeout growth on retries is client-specific logic
- Add `timeout` parameter to all public resource client methods for
  user-controllable per-request HTTP timeouts
- Add timeout parameter to base methods `_list()`, `_create()`, and
  `_get_or_create()` which previously lacked it
- Rename domain-specific Actor/Task run timeout from `timeout` to
  `run_timeout` to avoid ambiguity with the HTTP request timeout
- Update docstrings to document three-state timeout semantics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vdusek vdusek requested a review from Pijukatel March 2, 2026 13:22
@vdusek vdusek marked this pull request as ready for review March 2, 2026 13:23
@vdusek vdusek changed the title refactor: Standardize timeout handling across all resource clients feat: Standardize timeout handling with per-endpoint Timeout type alias Mar 2, 2026
@vdusek vdusek changed the title feat: Standardize timeout handling with per-endpoint Timeout type alias feat: Add per-endpoint timeout configuration for all resource clients Mar 2, 2026
@vdusek vdusek marked this pull request as draft March 2, 2026 14:39
Introduce `timeout_max` parameter to cap exponential timeout growth
independently from the initial timeout. Lower defaults to
`DEFAULT_REQUEST_TIMEOUT=10s`, `DEFAULT_REQUEST_TIMEOUT_MAX=600s`,
and `DEFAULT_MAX_RETRIES=4`, producing the sequence [10, 20, 40, 80, 160].

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vdusek vdusek force-pushed the worktree-fix-timeouts branch from 634577c to ac20367 Compare March 2, 2026 14:57
Copy link
Contributor

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be very careful about changing the actual timeouts. Can we please split this into 2 PRs:

  • Refactoring (majority)
  • Assigning different timeouts to API operations


@property
def timeout_short(self) -> timedelta:
"""Timeout for fast CRUD operations (e.g., get, update, delete)."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we can do such a blanket description. Some endpoints will have get/update/post even with the long timeout. For example, dealing with kvs value that can be very large? Might be other endpoints as well

Also, the timeout_short and timeout (medium) should probably be used only in idempotent requests, to avoid doing some non-idempotent operation twice just because of a short timeout.

"""Compute timeout for a request attempt with exponential increase, bounded by timeout_max.

For `'no_timeout'`, returns 0 (impit interprets this as no timeout). For `None`, uses the client-level timeout.
For `None` (resolved from `'no_timeout'`), returns `None` — impit uses client-level default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets me confused about timeouts. Sometimes None means no timeout, sometimes default.

)

async def get(self, *, timeout: Timeout = None) -> Build | None:
async def get(self, *, timeout: Timeout = 'short') -> Build | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an example operation that changed from long (default) to short (see my comment about 2 PRs)

@vdusek vdusek force-pushed the worktree-fix-timeouts branch from 5071549 to 3d9f230 Compare March 3, 2026 15:04
@vdusek vdusek changed the title feat: Add per-endpoint timeout configuration for all resource clients feat: Introduce tiered timeout system with per-endpoint configuration Mar 3, 2026
@vdusek vdusek force-pushed the worktree-fix-timeouts branch from 3d9f230 to 0ad2c46 Compare March 3, 2026 15:21
@vdusek vdusek marked this pull request as ready for review March 3, 2026 15:21
@vdusek vdusek marked this pull request as draft March 3, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants