vulkan: when using transfer queue for async copies, sync on event_wait to avoid race by 0cc4m · Pull Request #25229 · ggml-org/llama.cpp

0cc4m · 2026-07-02T09:10:41Z

Overview

When async_use_transfer_queue is set (for AMD RDNA GPUs), the async queue did not wait for events yet. On RADV this didn't cause issues, but it could be the source of the issue reported for AMD Windows devices. I can't reproduce it, so this is a guess, but I have verified it does not regress performance or cause incoherent output on Linux.

This is an attempt to fix the issue reported in #25195, could be an alternative to #25196 depending on performance. @liminfei-amd please check if this resolves the race condition. It was the only issue I could find with the transfer queue use.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, claude wrote the code, I reviewed and tested.

…t to avoid race

liminfei-amd · 2026-07-03T12:27:13Z

Thanks @0cc4m, this looks like the right direction! One heads-up: the new sync only fires in event_wait, which needs pipeline parallelism (n_copies > 1), so it won't cover the single-GPU --n-cpu-moe path from #25195. Extending the same submit-level sync to the direct set_tensor_async uploads would close that gap — happy to help!

0cc4m · 2026-07-03T12:40:30Z

Whoever is using the async copy commands needs to use either ggml_backend_synchronize or events to make sure they are done by the time it wants to use them, and also that the read is done before it writes to the buffer again. That is done for moe expert-upload as well. I don't see the problem you mean.

vulkan: when using transfer queue for async copies, sync on event_wai…

940d327

…t to avoid race

0cc4m requested a review from a team as a code owner July 2, 2026 09:10

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: when using transfer queue for async copies, sync on event_wait to avoid race#25229

vulkan: when using transfer queue for async copies, sync on event_wait to avoid race#25229
0cc4m wants to merge 1 commit into
masterfrom
0cc4m/vulkan-event-async-transfer-queue-sync

0cc4m commented Jul 2, 2026

Uh oh!

liminfei-amd commented Jul 3, 2026

Uh oh!

0cc4m commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

0cc4m commented Jul 2, 2026

Overview

Requirements

Uh oh!

liminfei-amd commented Jul 3, 2026

Uh oh!

0cc4m commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants