Feature Request
Add per-column idle-time metrics to the end-of-run metrics emitted after a create run completes.
Motivation
For model-backed generation columns, it is hard to tell whether total runtime is dominated by model response latency or by time where a column is not actually waiting on an in-flight request. When tuning generation plans, concurrency, scheduling, and model configuration, we should be able to see how much of the overall run was spent with each model-backed column truly idle.
Proposed Behavior
For each generation column that can issue model requests, likely columns using the model mixin / ModelConfig, report metrics such as:
request_wait_wall_time_s: wall-clock time during the run where the column had at least one in-flight model request and was waiting on a response.
idle_time_s: wall-clock time during the run where the column was not waiting on a model response.
idle_pct_of_run: idle_time_s / total_run_wall_time_s.
- Existing or related request metrics, if available, such as request count and summed request latency, should remain separate from wall-clock wait time.
The important distinction is that request_wait_wall_time_s should avoid double-counting overlapping requests for the same column. If a column has multiple concurrent requests in flight, the wall-clock wait metric should represent the union of those waiting intervals, not the sum of every individual request latency. That makes the idle calculation answer: "during how much of the total run was this column not waiting on a model response?"
A table-like end-of-run summary could look like:
column model_config request_wait_wall_s idle_s idle_pct_of_run requests
prompt gpt-4.1-mini 83.2 16.8 16.8% 240
label nemotron 42.5 57.5 57.5% 120
Acceptance Criteria
- End-of-run metrics after
create include per-column idle metrics for columns that issue model requests.
- The metrics are available wherever end-of-run metrics are currently surfaced, not only in logs, if a structured metrics object exists.
- Overlapping in-flight requests for a column are handled as wall-clock intervals rather than double-counted summed latencies.
- Non-model-backed columns are either omitted from this section or reported with an explicit not-applicable state.
- Tests cover at least one mocked model-backed column with controlled request timing and one case with overlapping requests.
Open Questions
- Should
idle_time_s use the full run wall-clock duration as its denominator, or only the column's active window between first scheduled work and final completed work?
- Should a later enhancement split idle time into dependency wait, scheduler/admission wait, local processing time, and no-work-complete time?
Feature Request
Add per-column idle-time metrics to the end-of-run metrics emitted after a
createrun completes.Motivation
For model-backed generation columns, it is hard to tell whether total runtime is dominated by model response latency or by time where a column is not actually waiting on an in-flight request. When tuning generation plans, concurrency, scheduling, and model configuration, we should be able to see how much of the overall run was spent with each model-backed column truly idle.
Proposed Behavior
For each generation column that can issue model requests, likely columns using the model mixin /
ModelConfig, report metrics such as:request_wait_wall_time_s: wall-clock time during the run where the column had at least one in-flight model request and was waiting on a response.idle_time_s: wall-clock time during the run where the column was not waiting on a model response.idle_pct_of_run:idle_time_s / total_run_wall_time_s.The important distinction is that
request_wait_wall_time_sshould avoid double-counting overlapping requests for the same column. If a column has multiple concurrent requests in flight, the wall-clock wait metric should represent the union of those waiting intervals, not the sum of every individual request latency. That makes the idle calculation answer: "during how much of the total run was this column not waiting on a model response?"A table-like end-of-run summary could look like:
Acceptance Criteria
createinclude per-column idle metrics for columns that issue model requests.Open Questions
idle_time_suse the full run wall-clock duration as its denominator, or only the column's active window between first scheduled work and final completed work?