Skip to content

Commit bc35df3

Browse files
ezhulenevcopybara-github
authored andcommitted
[xla:cpu] Optimize ThunkExecutor::Execute part #1
name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 889µs ± 1% 740µs ± 3% -16.70% BM_SelectAndScatterF32/256/process_time 3.64ms ± 2% 3.00ms ± 1% -17.64% BM_SelectAndScatterF32/512/process_time 15.3ms ± 1% 13.1ms ± 3% -14.61% PiperOrigin-RevId: 658063846
1 parent 2556f9f commit bc35df3

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

xla/service/cpu/runtime/thunk_executor.cc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,12 @@ tsl::AsyncValueRef<ThunkExecutor::ExecuteEvent> ThunkExecutor::Execute(
162162
Execute(state.get(), params, ReadyQueue(source_.begin(), source_.end()),
163163
/*lock=*/params.session.Join());
164164

165+
// If execution already completed (all kernels executed in the caller thread),
166+
// immediately return the result to avoid wasteful reference counting below.
167+
if (ABSL_PREDICT_TRUE(state->execute_event.IsAvailable())) {
168+
return std::move(state->execute_event);
169+
}
170+
165171
// Move execute state to the execute event callback to ensure that it is kept
166172
// alive while thunk executor has pending tasks.
167173
auto execute_event = state->execute_event;

0 commit comments

Comments
 (0)