[SYCL] Fix use-after-free releasing UR programs through a freed adapter#22369
[SYCL] Fix use-after-free releasing UR programs through a freed adapter#22369koparasy wants to merge 2 commits into
Conversation
Managed<ur_program/kernel_handle_t> stored a raw adapter_impl* and released its UR resource in ~Managed through that pointer. Adapters were owned by GlobalHandler as raw new/delete and freed in unloadAdapters(). A cached or in-flight Managed (e.g. the program build result kept in a context's KernelProgramCache, or a discarded getBuiltURProgram() retain() temporary) could outlive adapter teardown and call urProgramRelease through a freed adapter -> use-after-free. On Windows/icx the freed adapter's func-ptr table reads back null, so the call jumps to 0x0 (SEH 0xC0000005); on other configs the freed memory stayed usable and the bug was latent. Own adapters with std::shared_ptr in GlobalHandler (the sole owner) and have Managed hold a std::weak_ptr<adapter_impl> (adapter_impl now derives enable_shared_from_this). ~Managed/retain lock the weak_ptr at point of use and skip the release when the adapter is already gone, instead of dereferencing freed memory. A weak_ptr is used rather than an owning shared_ptr so that Managed handles do not extend adapter lifetime, which would perturb the process-shutdown ordering of other runtime globals. Fixes #22367 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The adapter_impl/Managed use-after-free (#22367) is fixed by the preceding commit, so the Windows/icx skips added for it are no longer needed.
efcb140 to
cc7b9b3
Compare
pbalcer
left a comment
There was a problem hiding this comment.
lgtm, but we should check if this doesn't have a negative impact on perf. There must have been a reason why this wasn't a shared_ptr in the first place. Benchmarks CI job is unfortunately broken right now https://github.com/intel/llvm/actions/workflows/sycl-ur-perf-benchmarking.yml :/ I've already pinged maintaining it.
|
@koparasy fyi: #19102, #20733 - @pbalcer is right, we intentionally switched to using raw pointers because: For unittests specifically, the use-after-free might be because we are not re-initializing the global handler object between unit test runs - I encountered a similar issue sometime ago. Could you instead experiment with tracing how the GlobalHandler object is created/destroyed in the failing unittest? |
|
https://intel.github.io/llvm/benchmarks/?runs=Baseline_BMG_L0v2%2CPR_22369_BMG_L0v2®ex=in+order There is a very slight (~0.5%) performance regression visible on charts with # of instructions retired: Time / latency benchmarks are within margin of error. |
|
@pbalcer is this within margins of acceptance? Or should I work on some other solution? |
|
We don't have specific criteria for accepting changes that negatively impact performance. @sergey-semenov what do you think? |
|
I will check @uditagarwal97 suggestion later today. Maybe it is sufficient to modify the test. |
|
@uditagarwal97 @pbalcer: Followed up on the Summary of what I found, plus where I think we should land. I instrumented
Every adapter lookup returns the owning instance (no split state), and when observable the teardown order is correct. All The crash reproduces deterministically only in the uninstrumented build. Every diagnostic I added (handle leak-check layer, adapter-identity registry, GlobalHandler tracing) shifts heap/teardown layout just enough to make it vanish and the ordering look clean. That's the typically a hint of a layout-sensitive use-after-free on the Option A (this PR, Option B (alternative, don't free the process-lifetime adapter). Keep raw pointers exactly as today, but in The objects are reclaimed by the OS at exit (same deliberate leak pattern as LLVM's ManagedStatic). What do you prefer? |
|
@koparasy, could you please check if changing the order of creation of sycl::unittest::UrMock<> Mock; (https://github.com/intel/llvm/pull/22368/changes#diff-994f544bc2bd64bd796cd3972201c4ef6814332f3635d560a63c25a6c44a445dR356) and ProgramManagerExposed PM; (https://github.com/intel/llvm/pull/22368/changes#diff-994f544bc2bd64bd796cd3972201c4ef6814332f3635d560a63c25a6c44a445dR344) helps. |
I am also interested if we have such criteria. @sergey-semenov kindly ping. |
Like Piotr said, we don't have them, it's on a case-by-case basis. Assuming the fix is correct, I think 0.5% instruction count increase with no apparent impact on time would be acceptable. |

Fixes #22367
Worflow run: https://github.com/intel/llvm/actions/runs/27797051328/job/82259006953