Skip to content

fix: record TakeExec output and I/O metrics#7228

Open
skyshineb wants to merge 1 commit into
lance-format:mainfrom
skyshineb:fix/take-exec-metrics-main
Open

fix: record TakeExec output and I/O metrics#7228
skyshineb wants to merge 1 commit into
lance-format:mainfrom
skyshineb:fix/take-exec-metrics-main

Conversation

@skyshineb

@skyshineb skyshineb commented Jun 11, 2026

Copy link
Copy Markdown

Problem

Closes #7227. The IoMetrics of the take operator are broken. The metrics such as output_bytes=0.0 B, output_batches=0, bytes_read=0 are not visible.

Example:
... Take: elapsed=350.781983256s, columns="_distance, _rowid, (vec)", metrics=[output_rows=10.00 K, elapsed_compute=302.52s, output_bytes=0.0 B, output_batches=0, batches_processed=1, bytes_read=0, iops=0, requests=0] ...

Cause

TakeExec was recording output row metrics inside map_batch, before the spawned take work had completed and before the final RecordBatch was emitted through DataFusion's metrics path. As a result,
output_batches, output_bytes, and ScanScheduler I/O metrics stayed at zero even when take work read data successfully.

Solution

This moves metric recording to the post-try_buffered result path, records the final RecordBatch with BaselineMetrics::record_poll, and records ScanScheduler I/O metrics after the actual take/read work
completes. The stream also finalizes baseline metrics and records one final I/O snapshot when it finishes.

Added a regression test covering output rows, output batches, output bytes, bytes read, IOPS, requests, and batches_processed.

@github-actions github-actions Bot added the bug Something isn't working label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: TakeExec reports zero output and I/O metrics after successful take

1 participant