Fix #1641: Handle cleanup when exceptions are thrown by mdboom · Pull Request #1669 · NVIDIA/cuda-python

mdboom · 2026-02-20T18:50:37Z

In driver, runtime and nvrtc, the generated code follows the following pattern:

def func(args):
    $init
    $precall
    $call
    $postcall
    $return

For certain argument types, $precall and $postcall are required to be pairs, such as malloc/free. If an exception is thrown when parsing one of the other arguments in $precall, $postcall will not get called, leaking memory or other resources.

This reorganizes the code (whenever $postcall is non-empty) to:

def func(args):
    $init
    try:
        $precall
        $call
    finally:
        $postcall
    $return

This diff is larger than it otherwise might need to be since Cython only allows cdef declarations at the top level. So all the cdef's that used to be part of $precall need to be moved to $init.

Performance impact

try/finally is implemented by Cython with gotos. It generates two copies of the finally clause: one for success and one for failure, so there is a cost in overall code size. (This is similar to CPython's bytecode implementation of try/finally). However, the runtime performance penalty is below the noise threshold in the #659 benchmark. There is a small performance penalty of around 200ns for the error case, but that's to be expected -- the old implementation leaked memory.

Alternatives considered

We could use C++ RAII to automatically free memory. In fact, some of our code in these files already does that in the use of std::vector. However, this is not generally applicable as a solution, since it's not possible to implement a custom RAII struct Cython that is not also a PyObject. This would make the HelperVoidPtrStruct optimization impossible.

We could require that $init does all validation (but not any resource allocation), and the $precall would never raise, so $postcall would always be guaranteed to run. It seems like this may have been part of the original design, but then has not been strictly enforced everywhere over time. Even if we could get to that, it has two problems. (1) Some exceptions in $precall are unavoidable, even if the inputs are valid, such as malloc failing. (2) Doing a separate validation and conversion pass on all the arguments is never going to be as performant as a single pass.

copy-pr-bot · 2026-02-20T18:50:41Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-02-20T19:08:20Z

/ok to test

github-actions · 2026-02-20T19:23:57Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1669/
https://nvidia.github.io/cuda-python/pr-preview/pr-1669/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1669/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1669/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

mdboom · 2026-02-20T20:00:42Z

/ok to test

leofang · 2026-02-21T02:59:43Z

cuda_bindings/cuda/bindings/nvrtc.pyx.in

-    cdef _HelperInputVoidPtrStruct cycallbackHelper
-    cdef void* cycallback = _helper_input_void_ptr(callback, &cycallbackHelper)
    cdef _HelperInputVoidPtrStruct cypayloadHelper
-    cdef void* cypayload = _helper_input_void_ptr(payload, &cypayloadHelper)
-    with nogil:
-        err = cynvrtc.nvrtcSetFlowCallback(cyprog, cycallback, cypayload)
-    _helper_input_void_ptr_free(&cycallbackHelper)
-    _helper_input_void_ptr_free(&cypayloadHelper)


Q: Maybe we should encapsulate _helper_input_void_ptr and _helper_input_void_ptr_free as a C++ class (not Cython cdef class, which was the previous approach IIRC)?

Yeah, that's not a bad idea. We'd get RAII for free.

Is it worth considering specialized guards to simplify a few scenarios like malloc/free, buffer release, and the generic _HelperInputVoidPtr?

For example,

struct FreeGuard { void* ptr; FreeGuard(void* p) : ptr(p) {} ~FreeGuard() { std::free(ptr); } void* release() { void* p = ptr; ptr = nullptr; return p; } };

The usage pattern is:

cdef void* cbData = malloc(sizeof(CallbackData)) if cbData == NULL: return ERROR cdef FreeGuard guard = FreeGuard(cbData) # ... precall that might throw ... # ... CUDA call ... m_global._allocated[key] = guard.release() # transfer ownership

This would avoid the issue I pointed out in another comment. It's also straightforward to build this on top of std::unique_ptr. Unfortunately, Cython does not support C++ lambdas, so each case needs its own class, but there don't seem to be many cases.

copy-pr-bot · 2026-03-03T18:08:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Andy-Jost · 2026-03-03T18:34:11Z

cuda_bindings/cuda/bindings/driver.pyx.in

+    finally:
+        if err != cydriver.CUDA_SUCCESS:
+            free(cbData)
+        else:
+            m_global._allocated[int(callback)] = cbData
+        _helper_input_void_ptr_free(&cyuserDataHelper)


This clean up block looks questionable on an early return from line 31021 (malloc failed). What is the value of err if control never reached line 31027? Will a NULL pointer be stored in m_global._allocated?

Andy-Jost · 2026-03-03T18:44:47Z

cuda_bindings/cuda/bindings/nvrtc.pyx.in

-    cdef _HelperInputVoidPtrStruct cycallbackHelper
-    cdef void* cycallback = _helper_input_void_ptr(callback, &cycallbackHelper)
    cdef _HelperInputVoidPtrStruct cypayloadHelper
-    cdef void* cypayload = _helper_input_void_ptr(payload, &cypayloadHelper)
-    with nogil:
-        err = cynvrtc.nvrtcSetFlowCallback(cyprog, cycallback, cypayload)
-    _helper_input_void_ptr_free(&cycallbackHelper)
-    _helper_input_void_ptr_free(&cypayloadHelper)


Is it worth considering specialized guards to simplify a few scenarios like malloc/free, buffer release, and the generic _HelperInputVoidPtr?

For example,

struct FreeGuard { void* ptr; FreeGuard(void* p) : ptr(p) {} ~FreeGuard() { std::free(ptr); } void* release() { void* p = ptr; ptr = nullptr; return p; } };

The usage pattern is:

cdef void* cbData = malloc(sizeof(CallbackData)) if cbData == NULL: return ERROR cdef FreeGuard guard = FreeGuard(cbData) # ... precall that might throw ... # ... CUDA call ... m_global._allocated[key] = guard.release() # transfer ownership

This would avoid the issue I pointed out in another comment. It's also straightforward to build this on top of std::unique_ptr. Unfortunately, Cython does not support C++ lambdas, so each case needs its own class, but there don't seem to be many cases.

Fix NVIDIA#1641: Handle cleanup when exceptions are thrown

6793639

Handle None early

d663f52

leofang linked an issue Feb 21, 2026 that may be closed by this pull request

BUG: API function calls do not cleanup correctly when a Python exception is thrown #1641

Open

leofang reviewed Feb 21, 2026

View reviewed changes

mdboom marked this pull request as draft March 3, 2026 18:08

Andy-Jost reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #1641: Handle cleanup when exceptions are thrown#1669

Fix #1641: Handle cleanup when exceptions are thrown#1669
mdboom wants to merge 2 commits intoNVIDIA:mainfrom
mdboom:handle-cleanup

mdboom commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

mdboom commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mdboom commented Feb 20, 2026

Uh oh!

leofang Feb 21, 2026

Uh oh!

mdboom Mar 3, 2026

Uh oh!

Andy-Jost Mar 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

Andy-Jost Mar 3, 2026

Uh oh!

Andy-Jost Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mdboom commented Feb 20, 2026

Performance impact

Alternatives considered

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

mdboom commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mdboom commented Feb 20, 2026

Uh oh!

leofang Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot bot commented Mar 3, 2026

Uh oh!

Andy-Jost Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Andy-Jost Mar 3, 2026 •

edited

Loading

Andy-Jost Mar 3, 2026 •

edited

Loading