Add on_poll callback for real-time query monitoring (#723) by laughingman7743 · Pull Request #724 · pyathena-dev/PyAthena

laughingman7743 · 2026-06-08T14:25:57Z

WHAT

Add an optional on_poll callback that is invoked once per poll iteration with the current execution object, enabling real-time query monitoring (e.g. live progress in Jupyter). Closes #723.

New on_poll parameter on BaseCursor.__init__ and Connection / connect(...) (and connection.cursor(...)). Default None → no behavior/perf change when unused.
Wired at the cursor/connection level only. It rides the existing **kwargs chain into BaseCursor.__init__, so no per-cursor execute() changes are needed — which also avoids adding to the execute() kwargs duplication tracked in Refactor shared execute() kwargs handling across cursor implementations #691.
Invoked right after each get_query_execution / get_calculation_execution_status, before the terminal-state check, so the final state is delivered too.
Covers all poll loops, not just BaseCursor.__poll:
- BaseCursor.__poll — sync + thread-pool async cursors (default/dict/pandas/arrow/polars/s3fs)
- SparkBaseCursor.__poll — Spark calculations
- aio native-async loops (AioBaseCursor, AioSparkCursor)
For Spark the callback receives the per-poll calculation status, so the signature is Callable[[AthenaQueryExecution | AthenaCalculationExecutionStatus], None] (exposed as pyathena.common.OnPollCallback).

Also: centralize `on_start_query_execution` storage

While adding on_poll to BaseCursor, the sibling on_start_query_execution field was still declared/stored in each of the five synchronous cursors (Cursor, PandasCursor, ArrowCursor, PolarsCursor, S3FSCursor). To keep the two callbacks symmetric, its storage is moved into BaseCursor.__init__ (mirroring on_poll). The per-execute() override and its invocation stay in each synchronous cursor — the broader execute() kwargs consolidation remains #691's scope. Async/aio/Spark cursors inherit the field but do not invoke it (they return the query id immediately); a comment documents this. Behavior is unchanged.

Usage:

def track(execution):
    print(f"State: {execution.state}")

# connection-level default
conn = connect(on_poll=track)
# or per-cursor
cursor = conn.cursor(AsyncPandasCursor, on_poll=track)

WHY

There is currently no public hook into the polling loop, so interactive environments cannot observe query state, elapsed time, or data scanned while a query runs. Requested in #723.

Tests

Unit (no AWS, deterministic): on_poll fires once per iteration in order incl. the terminal state; None is a no-op; Spark calculation poll variant.
Integration (cheap SELECT 1): connection-level, cursor-level, and async (AsyncCursor, matching the issue example).
Existing on_start_query_execution callback tests pass unchanged after the storage move.

make lint (ruff + mypy) and the callback/on_poll tests pass locally.

Refs #691.

Add an optional on_poll callback to BaseCursor/Connection, invoked once per poll iteration with the current execution object. Wired at the cursor/connection level so it propagates via the existing **kwargs chain (no execute() changes; avoids adding to the execute() duplication tracked in #691). Covers all poll loops: BaseCursor.__poll (sync + thread-pool async), SparkBaseCursor.__poll, and the native-async aio loops. For Spark the callback receives the per-poll calculation status, so the signature is Callable[[AthenaQueryExecution | AthenaCalculationExecutionStatus], None]. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a "Query polling callback" section to docs/usage.md covering on_poll: connection- and cursor-level configuration, the synchronous callback contract, per-iteration invocation including the terminal state, async cursor usage, and the Spark calculation-status payload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Move the on_start_query_execution field from the five synchronous cursors (Cursor, PandasCursor, ArrowCursor, PolarsCursor, S3FSCursor) into BaseCursor.__init__, mirroring on_poll, so both connection-level callbacks live in one place. The per-execute() override and its invocation stay in each synchronous cursor (the broader execute() kwargs consolidation is tracked in #691). Async/aio/Spark cursors inherit the field but do not invoke it, as they return the query id immediately through their execution model. Behavior is unchanged; a clarifying comment documents this. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

laughingman7743 and others added 3 commits June 8, 2026 23:25

laughingman7743 marked this pull request as ready for review June 9, 2026 12:29

laughingman7743 merged commit 1bf6e39 into master Jun 9, 2026
17 checks passed

laughingman7743 deleted the feature/on-poll-callback-723 branch June 9, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add on_poll callback for real-time query monitoring (#723)#724

Add on_poll callback for real-time query monitoring (#723)#724
laughingman7743 merged 3 commits into
masterfrom
feature/on-poll-callback-723

laughingman7743 commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laughingman7743 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WHAT

Also: centralize on_start_query_execution storage

WHY

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

laughingman7743 commented Jun 8, 2026 •

edited

Loading

Also: centralize `on_start_query_execution` storage