Skip to content

[SYCL] Fix use-after-free releasing UR programs through a freed adapter#22369

Open
koparasy wants to merge 2 commits into
syclfrom
fix/adapter-uaf-shared-ptr
Open

[SYCL] Fix use-after-free releasing UR programs through a freed adapter#22369
koparasy wants to merge 2 commits into
syclfrom
fix/adapter-uaf-shared-ptr

Conversation

@koparasy

Copy link
Copy Markdown
Contributor

koparasy and others added 2 commits June 18, 2026 20:25
Managed<ur_program/kernel_handle_t> stored a raw adapter_impl* and released
its UR resource in ~Managed through that pointer. Adapters were owned by
GlobalHandler as raw new/delete and freed in unloadAdapters(). A cached or
in-flight Managed (e.g. the program build result kept in a context's
KernelProgramCache, or a discarded getBuiltURProgram() retain() temporary)
could outlive adapter teardown and call urProgramRelease through a freed
adapter -> use-after-free. On Windows/icx the freed adapter's func-ptr table
reads back null, so the call jumps to 0x0 (SEH 0xC0000005); on other configs
the freed memory stayed usable and the bug was latent.

Own adapters with std::shared_ptr in GlobalHandler (the sole owner) and have
Managed hold a std::weak_ptr<adapter_impl> (adapter_impl now derives
enable_shared_from_this). ~Managed/retain lock the weak_ptr at point of use and
skip the release when the adapter is already gone, instead of dereferencing
freed memory. A weak_ptr is used rather than an owning shared_ptr so that
Managed handles do not extend adapter lifetime, which would perturb the
process-shutdown ordering of other runtime globals.

Fixes #22367

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The adapter_impl/Managed use-after-free (#22367) is fixed by the
preceding commit, so the Windows/icx skips added for it are no longer needed.
@koparasy koparasy force-pushed the fix/adapter-uaf-shared-ptr branch from efcb140 to cc7b9b3 Compare June 19, 2026 03:27

@pbalcer pbalcer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but we should check if this doesn't have a negative impact on perf. There must have been a reason why this wasn't a shared_ptr in the first place. Benchmarks CI job is unfortunately broken right now https://github.com/intel/llvm/actions/workflows/sycl-ur-perf-benchmarking.yml :/ I've already pinged maintaining it.

@uditagarwal97

Copy link
Copy Markdown
Contributor

@koparasy fyi: #19102, #20733 - @pbalcer is right, we intentionally switched to using raw pointers because:
(1) to improve performance
(2) theoretically, GlobalHandler should be the sole owner of adapter handles because it's among the first ones to get initialized when SYCL RT is loaded and the last to get destructed during application shutdown. So, sharing ownership of adapter handles (i.e. shared_ptr) doesn't make much sense here.

For unittests specifically, the use-after-free might be because we are not re-initializing the global handler object between unit test runs - I encountered a similar issue sometime ago. Could you instead experiment with tracing how the GlobalHandler object is created/destroyed in the failing unittest?

@pbalcer

pbalcer commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

https://intel.github.io/llvm/benchmarks/?runs=Baseline_BMG_L0v2%2CPR_22369_BMG_L0v2&regex=in+order

There is a very slight (~0.5%) performance regression visible on charts with # of instructions retired:
SubmitKernel in order, CPU count (5)

Time / latency benchmarks are within margin of error.

@koparasy

Copy link
Copy Markdown
Contributor Author

@pbalcer is this within margins of acceptance? Or should I work on some other solution?

@pbalcer

pbalcer commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

We don't have specific criteria for accepting changes that negatively impact performance. @sergey-semenov what do you think?

@koparasy

Copy link
Copy Markdown
Contributor Author

I will check @uditagarwal97 suggestion later today. Maybe it is sufficient to modify the test.

@koparasy

Copy link
Copy Markdown
Contributor Author

@uditagarwal97 @pbalcer: Followed up on the GlobalHandler suggestion.

Summary of what I found, plus where I think we should land.

I instrumented GlobalHandler construction/destruction and adapter ownership in the failing program_manager unit tests on the icx Windows CI (the only reproducer):

GlobalHandler is not re-initialized or swapped mid-process. There is one GlobalHandler per process, resetGlobalHandler() never fires during the run, exactly one adapter_impl is created and it is owned by that one handler.

Every adapter lookup returns the owning instance (no split state), and when observable the teardown order is correct. All ~Managed releases happen before the adapter is destroyed.
So the "GlobalHandler not reset between tests" path doesn't appear to be the cause here. I also confirmed the crash site directly from the in-process symbolized stack: ~Managed<ur_program> --> adapter_impl::call<urProgramRelease> jumping through a null per-instance func-ptr table (RIP=0, SEH 0xC0000005).

The crash reproduces deterministically only in the uninstrumented build. Every diagnostic I added (handle leak-check layer, adapter-identity registry, GlobalHandler tracing) shifts heap/teardown layout just enough to make it vanish and the ordering look clean. That's the typically a hint of a layout-sensitive use-after-free on the adapter_impl. Consistent with how #22047 flipped this from latent to fatal by only reordering an include graph.

Option A (this PR, weak_ptr). Implemented and looks correct: Managed holds a weak_ptr<adapter_impl>, locks at point of use, and skips the release if the adapter is already gone. It resolves the UAF cleanly on icx (CI green, tests re-enabled). Measured cost is the ~0.5% instructions-retired regression @pbalcer, you flagged on SubmitKernel in order; time/latency are within noise. Given it's a correctness fix for a real UAF, ~0.5% instructions (no measurable latency impact) seems an acceptable trade to me.

Option B (alternative, don't free the process-lifetime adapter). Keep raw pointers exactly as today, but in unloadAdapters() keep release() and drop the delete (track the pointers in a process-lifetime container). Since GlobalHandler is the sole, first-constructed/last-destroyed owner, the adapter legitimately lives for the whole process; not freeing it closes the UAF window with zero cost and no ownership-model change.

The objects are reclaimed by the OS at exit (same deliberate leak pattern as LLVM's ManagedStatic).

What do you prefer?

@KseniyaTikhomirova

KseniyaTikhomirova commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@koparasy, could you please check if changing the order of creation of sycl::unittest::UrMock<> Mock; (https://github.com/intel/llvm/pull/22368/changes#diff-994f544bc2bd64bd796cd3972201c4ef6814332f3635d560a63c25a6c44a445dR356) and ProgramManagerExposed PM; (https://github.com/intel/llvm/pull/22368/changes#diff-994f544bc2bd64bd796cd3972201c4ef6814332f3635d560a63c25a6c44a445dR344) helps.
since they are destroyed in the reverse order of creation, PM objects can refer to the dead ur mock resources that may cause seg fault. It seems to me it still can be a test issue. Could you please confirm?

@KseniyaTikhomirova

Copy link
Copy Markdown
Contributor

We don't have specific criteria for accepting changes that negatively impact performance. @sergey-semenov what do you think?

I am also interested if we have such criteria. @sergey-semenov kindly ping.

@sergey-semenov

sergey-semenov commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

We don't have specific criteria for accepting changes that negatively impact performance. @sergey-semenov what do you think?

I am also interested if we have such criteria. @sergey-semenov kindly ping.

Like Piotr said, we don't have them, it's on a case-by-case basis. Assuming the fix is correct, I think 0.5% instruction count increase with no apparent impact on time would be acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use-after-free: Managed<> releases UR resource through a freed adapter_impl (Windows/icx SEH 0xC0000005)

5 participants