Skip to content

feat(query): 429 retry + truncation auto-follow (#688)#111

Merged
zfarrell merged 5 commits into
mainfrom
feat/688-query-contract
Jun 15, 2026
Merged

feat(query): 429 retry + truncation auto-follow (#688)#111
zfarrell merged 5 commits into
mainfrom
feat/688-query-contract

Conversation

@zfarrell

@zfarrell zfarrell commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

SDK support for the #640 bounded-memory query contract (#688): an enhanced hotdata.query.QueryApi that transparently retries HTTP 429 (OVERLOADED) and auto-follows truncated results, made the default client so the obvious import path gets the safe behavior.

Included

  • 429 retry honoring Retry-After (uncapped — the server value is authoritative), else exponential backoff + jitter under a deadline budget; ServerOverloadedError on exhaustion.
  • Truncation auto-follow — poll the out-of-band result to ready, materialize the full rows, guarded by max_auto_rows (1M) and max_auto_bytes (64 MiB). Readiness poll uses limit=0 (status only) so it never pulls the result into RAM before the guards run.
  • Failed results (delivered as HTTP 409) translated to ResultFailedError; shared ResultError base for all lifecycle errors.
  • Default exportfrom hotdata import QueryApi / ResultsApi now resolve to the enhanced clients (raw generated ones remain at hotdata.api.*), kept across regen by scripts/patch_query_exports.py.
  • CI — new unit job runs all non-integration tests on every PR (they previously never ran in CI).

Notes

  • Coverage is unit-level (23 query tests); the truncation/429 integration scenarios land once the backend (#685/#686) deploys.
  • Lazy-streaming result iterator deferred to #689 (this PR keeps eager materialization with guards).

Closes #688.

@zfarrell zfarrell force-pushed the feat/688-query-contract branch from 75a78d2 to 9f01c24 Compare June 15, 2026 20:44
@zfarrell zfarrell marked this pull request as ready for review June 15, 2026 20:45
Comment thread hotdata/query.py Outdated
Comment on lines +362 to +369
result = api.get_result(result_id)
last_status = result.status
if last_status == "ready":
return result
if last_status == "failed":
raise ResultFailedError(
result_id=result_id, error_message=result.error_message
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: the failed branch is dead code against the real backend, so a failed result won't surface as ResultFailedError.

Per the server contract (and results_api.py's response map), a failed result is returned as HTTP 409, and ApiClient.response_deserialize (api_client.py:326-331) raises ApiException for any non-2xx status. So api.get_result(result_id) raises on a failed result — it never returns a GetResultResponse with status == "failed". This branch can only be reached if the server ever returns 200/202 with a failed body, which the contract says it doesn't.

The result: a real failed result propagates a raw ApiException (status 409) out of wait_for_result / query(), instead of the documented ResultFailedError. test_autofollow_failed_result_raises masks this because its stub returns a status="failed" object rather than raising a 409 ApiException like the generated client does.

Catch the 409 and translate it (the deserialized body is on exc.data):

        while True:
            try:
                result = api.get_result(result_id)
            except ApiException as exc:
                if exc.status == 409:
                    data = getattr(exc, "data", None)
                    raise ResultFailedError(
                        result_id=result_id,
                        error_message=getattr(data, "error_message", None),
                    ) from exc
                raise
            last_status = result.status
            if last_status == "ready":
                return result

and update the test stub to raise a 409 ApiException so it reflects real behavior.

Comment thread hotdata/query.py Outdated
delay = policy.base_backoff_s
last_status = "pending"
while True:
result = api.get_result(result_id)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: the readiness poll undermines the bounded-memory guard it exists to enforce.

get_result(result_id) is called here with no limit, and per the contract limit is unbounded by default — so for a ready result this materializes the entire row set into the GetResultResponse on every poll. The max_auto_rows / max_auto_bytes guards in _materialize_full / _fetch_all_rows only run after wait_for_result returns, so an oversized result is fully pulled into client memory before the guard can trip. That defeats the module's stated guarantee that "an unbounded result is never silently pulled into client memory," and it also re-fetches everything redundantly (_fetch_all_rows pages it again from offset 0).

Poll with limit=0 to fetch only the status (0 is valid — both must be non-negative):

            result = api.get_result(result_id, limit=0)

_authoritative_total already takes the count from the preview / query-run record, so the poll doesn't need any rows.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Blocking Issues

  1. wait_for_result failed-result handling is dead against the real backend (hotdata/query.py:366-369). A failed result is returned as HTTP 409, and the generated client raises ApiException on any non-2xx (api_client.py:326). So get_result never returns a status == "failed" object — a real failed result escapes as a raw 409 ApiException instead of the documented ResultFailedError. The test passes only because its stub returns a failed object rather than raising 409.

  2. Readiness poll undermines the bounded-memory guard (hotdata/query.py:362). The poll calls get_result with no limit, which is unbounded by default, so a ready result is fully materialized into memory on every poll — before max_auto_rows / max_auto_bytes can trip. The guards run only after wait_for_result returns, so an oversized result is pulled into client RAM (and then re-fetched page by page). Poll with limit=0.

Action Required

  • Catch ApiException with status 409 in wait_for_result and translate it to ResultFailedError (error message is on exc.data.error_message); update test_autofollow_failed_result_raises to raise a 409 ApiException so it reflects the generated client's actual behavior.
  • Pass limit=0 on the readiness poll so it fetches status only, not the full result set.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both prior blocking issues are resolved:

  1. 409 → ResultFailedError (query.py:366-380): the failed-result branch now catches the HTTP 409 ApiException the generated client actually raises and translates it, pulling error_message off exc.data. Verified results_api.py maps 409 → GetResultResponse and api_client.py deserializes the body into exc.data, so this works against the real backend, not just the stub. The test stub was updated to raise a real 409.

  2. limit=0 readiness poll (query.py:368): the poll now fetches status only, so a ready oversized result is no longer pulled into memory before the size guards run. Confirmed limit is ge=0-validated and serialized when non-None.

No new blocking issues.

@zfarrell zfarrell merged commit b37b379 into main Jun 15, 2026
5 checks passed
@zfarrell zfarrell deleted the feat/688-query-contract branch June 15, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant