Skip to content

Add on_poll callback for real-time query monitoring (#723)#724

Merged
laughingman7743 merged 3 commits into
masterfrom
feature/on-poll-callback-723
Jun 9, 2026
Merged

Add on_poll callback for real-time query monitoring (#723)#724
laughingman7743 merged 3 commits into
masterfrom
feature/on-poll-callback-723

Conversation

@laughingman7743

@laughingman7743 laughingman7743 commented Jun 8, 2026

Copy link
Copy Markdown
Member

WHAT

Add an optional on_poll callback that is invoked once per poll iteration with the current execution object, enabling real-time query monitoring (e.g. live progress in Jupyter). Closes #723.

  • New on_poll parameter on BaseCursor.__init__ and Connection / connect(...) (and connection.cursor(...)). Default None → no behavior/perf change when unused.
  • Wired at the cursor/connection level only. It rides the existing **kwargs chain into BaseCursor.__init__, so no per-cursor execute() changes are needed — which also avoids adding to the execute() kwargs duplication tracked in Refactor shared execute() kwargs handling across cursor implementations #691.
  • Invoked right after each get_query_execution / get_calculation_execution_status, before the terminal-state check, so the final state is delivered too.
  • Covers all poll loops, not just BaseCursor.__poll:
    • BaseCursor.__poll — sync + thread-pool async cursors (default/dict/pandas/arrow/polars/s3fs)
    • SparkBaseCursor.__poll — Spark calculations
    • aio native-async loops (AioBaseCursor, AioSparkCursor)
  • For Spark the callback receives the per-poll calculation status, so the signature is Callable[[AthenaQueryExecution | AthenaCalculationExecutionStatus], None] (exposed as pyathena.common.OnPollCallback).

Also: centralize on_start_query_execution storage

While adding on_poll to BaseCursor, the sibling on_start_query_execution field was still declared/stored in each of the five synchronous cursors (Cursor, PandasCursor, ArrowCursor, PolarsCursor, S3FSCursor). To keep the two callbacks symmetric, its storage is moved into BaseCursor.__init__ (mirroring on_poll). The per-execute() override and its invocation stay in each synchronous cursor — the broader execute() kwargs consolidation remains #691's scope. Async/aio/Spark cursors inherit the field but do not invoke it (they return the query id immediately); a comment documents this. Behavior is unchanged.

Usage:

def track(execution):
    print(f"State: {execution.state}")

# connection-level default
conn = connect(on_poll=track)
# or per-cursor
cursor = conn.cursor(AsyncPandasCursor, on_poll=track)

WHY

There is currently no public hook into the polling loop, so interactive environments cannot observe query state, elapsed time, or data scanned while a query runs. Requested in #723.

Tests

  • Unit (no AWS, deterministic): on_poll fires once per iteration in order incl. the terminal state; None is a no-op; Spark calculation poll variant.
  • Integration (cheap SELECT 1): connection-level, cursor-level, and async (AsyncCursor, matching the issue example).
  • Existing on_start_query_execution callback tests pass unchanged after the storage move.

make lint (ruff + mypy) and the callback/on_poll tests pass locally.

Refs #691.

laughingman7743 and others added 3 commits June 8, 2026 23:25
Add an optional on_poll callback to BaseCursor/Connection, invoked once
per poll iteration with the current execution object. Wired at the
cursor/connection level so it propagates via the existing **kwargs chain
(no execute() changes; avoids adding to the execute() duplication tracked
in #691).

Covers all poll loops: BaseCursor.__poll (sync + thread-pool async),
SparkBaseCursor.__poll, and the native-async aio loops. For Spark the
callback receives the per-poll calculation status, so the signature is
Callable[[AthenaQueryExecution | AthenaCalculationExecutionStatus], None].

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a "Query polling callback" section to docs/usage.md covering on_poll:
connection- and cursor-level configuration, the synchronous callback
contract, per-iteration invocation including the terminal state, async
cursor usage, and the Spark calculation-status payload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the on_start_query_execution field from the five synchronous cursors
(Cursor, PandasCursor, ArrowCursor, PolarsCursor, S3FSCursor) into
BaseCursor.__init__, mirroring on_poll, so both connection-level callbacks
live in one place. The per-execute() override and its invocation stay in
each synchronous cursor (the broader execute() kwargs consolidation is
tracked in #691).

Async/aio/Spark cursors inherit the field but do not invoke it, as they
return the query id immediately through their execution model. Behavior is
unchanged; a clarifying comment documents this.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@laughingman7743 laughingman7743 marked this pull request as ready for review June 9, 2026 12:29
@laughingman7743 laughingman7743 merged commit 1bf6e39 into master Jun 9, 2026
17 checks passed
@laughingman7743 laughingman7743 deleted the feature/on-poll-callback-723 branch June 9, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add on_poll callback to polling loop for real-time query monitoring

1 participant