Skip to content
54 changes: 38 additions & 16 deletions include/pybind11/detail/internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,19 @@ struct internals {
internals(internals &&other) = delete;
internals &operator=(const internals &other) = delete;
internals &operator=(internals &&other) = delete;
~internals() = default;
~internals() {
// Normally this destructor runs during interpreter finalization and it may DECREF things.
// In odd finalization scenarios it might end up running after the interpreter has
// completely shut down, In that case, we should not decref these objects because pymalloc
// is gone.
if (Py_IsInitialized() != 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check defeats the point. Py_IsInitialized begins returning false quite early in Py_Finalize, well before the state dict is cleared, so adding this check will bring your leak back (at least for the main interpreter).

internals can only be freed by a DECREF on the capsule (in which case its destructor runs at a time when it's safe to DECREF things) or by an explicit call to internals_pp_manager::destroy(). Do any of the latter happen at times when it's not safe to DECREF things?

Copy link
Copy Markdown
Collaborator Author

@b-pass b-pass Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If internals is re-created during finalization, then it will be set in the unique_ptr and the current pybind11 finalization code will subsequently call destroy (after PyFinalize has completed) and delete the re-created version.

We could, of course, change the pybind11 finalization process to not behave this way....

Py_XDECREF(static_property_type);
Comment thread
b-pass marked this conversation as resolved.
Outdated
static_property_type = nullptr;

Py_XDECREF(default_metaclass);
default_metaclass = nullptr;
}
}
};

// the internals struct (above) is shared between all the modules. local_internals are only
Expand All @@ -325,6 +337,17 @@ struct local_internals {

std::forward_list<ExceptionTranslator> registered_exception_translators;
PyTypeObject *function_record_py_type = nullptr;

~local_internals() {
// Normally this destructor runs during interpreter finalization and it may DECREF things.
// In odd finalization scenarios it might end up running after the interpreter has
// completely shut down, In that case, we should not decref these objects because pymalloc
// is gone.
if (Py_IsInitialized() != 0) {
Py_XDECREF(function_record_py_type);
function_record_py_type = nullptr;
}
}
};

enum class holder_enum_t : uint8_t {
Expand Down Expand Up @@ -569,7 +592,7 @@ inline object get_python_state_dict() {
// The bool follows std::map::insert convention: true = created, false = existed.
template <typename Payload>
std::pair<Payload *, bool> atomic_get_or_create_in_state_dict(const char *key,
bool clear_destructor = false) {
void (*dtor)(void *) = nullptr) {
error_scope err_scope; // preserve any existing Python error states

auto state_dict = reinterpret_borrow<dict>(get_python_state_dict());
Expand All @@ -595,7 +618,7 @@ std::pair<Payload *, bool> atomic_get_or_create_in_state_dict(const char *key,
// - If our capsule is NOT inserted (another thread inserted first), it will be
// destructed when going out of scope here, so the destructor will be called
// immediately, which will also free the storage.
/*destructor=*/[](void *ptr) -> void { delete static_cast<Payload *>(ptr); });
/*destructor=*/dtor);
Copy link
Copy Markdown
Collaborator

@oremanj oremanj Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a semantics-preserving change.

There are two possibilities for when the capsule gets destroyed:

  • It could be destroyed almost immediately, if another thread inserted the same key before we did. See the comment on line 639. In that case, no one else has seen our newly-allocated payload, so we should delete it.
  • It could be destroyed at interpreter finalization. That's when custom destruction logic might come into play.

If two threads concurrently try to create internals, your use of the final dtor from the start means that one of them will leak its unique_ptr. Not a terrible outcome, but easy enough to avoid.

Suggest you change the dtor parameter to dtor_if_inserted. Create the capsule initially with a destructor that deallocates the payload. If insertion succeeds and dtor_if_inserted is not null, then swap the capsule destructor to be dtor_if_inserted instead. Effectively, the old clear_destructor parameter is a special case of the dtor_if_inserted semantics: dtor_if_inserted is a no-op lambda if clear_destructor is true, or is null (i.e. continue with the original dtor that deletes the payload) if clear_destructor is false.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I believe the latest change addresses this.

// At this point, the capsule object is created successfully.
// Release the unique_ptr and let the capsule object own the storage to avoid double-free.
(void) storage_ptr.release();
Expand All @@ -613,13 +636,6 @@ std::pair<Payload *, bool> atomic_get_or_create_in_state_dict(const char *key,
throw error_already_set();
}
created = (capsule_obj == new_capsule.ptr());
if (clear_destructor && created) {
// Our capsule was inserted.
// Remove the destructor to leak the storage on interpreter shutdown.
if (PyCapsule_SetDestructor(capsule_obj, nullptr) < 0) {
throw error_already_set();
}
}
// - If key already existed, our `new_capsule` is not inserted, it will be destructed when
// going out of scope here, which will also free the storage.
// - Otherwise, our `new_capsule` is now in the dict, and it owns the storage and the state
Expand Down Expand Up @@ -707,14 +723,20 @@ class internals_pp_manager {
internals_pp_manager(char const *id, on_fetch_function *on_fetch)
: holder_id_(id), on_fetch_(on_fetch) {}

static void internals_shutdown(void *vpp) {
auto *pp = static_cast<std::unique_ptr<InternalsType> *>(vpp);
if (pp) {
pp->reset();
}
// Because the unique_ptr is still pointed to by the pp_manager in this and possibly other
// modules, we cannot delete the unique_ptr itself until after the interpreter has shut
// down. If this interpreter was not created/owned by pybind11 then the unique_ptr itself
// (but not its contents) is leaked.
Comment thread
b-pass marked this conversation as resolved.
Outdated
}

std::unique_ptr<InternalsType> *get_or_create_pp_in_state_dict() {
// The `unique_ptr<InternalsType>` output is leaked on interpreter shutdown. Once an
// instance is created, it will never be deleted until the process exits (compare to
// interpreter shutdown in multiple-interpreter scenarios).
// Because we cannot guarantee the order of destruction of capsules in the interpreter
// state dict, leaking avoids potential use-after-free issues during interpreter shutdown.
Comment on lines -711 to -715
Copy link
Copy Markdown
Contributor

@XuehaiPan XuehaiPan Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freeing the unique_ptr content is causing use-after-free segmentation faults. Because the order of GC is not guaranteed.

https://github.com/metaopt/optree/actions/runs/21203776041/job/60995252956?pr=245#step:18:229

libc++abi: terminating due to uncaught exception of type pybind11::cast_error: Unable to cast Python instance of type <class 'optree.PyTreeSpec'> to C++ type '?'
  #7  0x00007fb9cc4a5a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
  No symbol table info available.
  #8  0x00007fb9cc4bb391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
  No symbol table info available.
  #9  0x00007fb9cc7b3990 in pybind11::detail::load_type<optree::PyTreeSpec, void> (conv=..., 
      handle=...) at /usr/include/c++/13/bits/allocator.h:184
  No locals.
  #10 0x00007fb9cc7cdfca in pybind11::detail::load_type<optree::PyTreeSpec&> (handle=...)
      at /tmp/pip-build-env-5okmu3ac/overlay/lib/python3.13t/site-packages/pybind11/include/pybind11/cast.h:1655
          conv = {<pybind11::detail::type_caster_base<optree::PyTreeSpec>> = {<pybind11::detail::type_caster_generic> = {typeinfo = 0x0, cpptype = 0x7fb9cc83a580 <typeinfo for optree::PyTreeSpec>, 
                value = 0x0}, static name = {text = "%"}}, <No data fields>}
  #11 0x00007fb9cc7cdffb in pybind11::cast<optree::PyTreeSpec&, 0> (handle=...)
      at /tmp/pip-build-env-5okmu3ac/overlay/lib/python3.13t/site-packages/pybind11/include/pybind11/cast.h:1680
          is_enum_cast = false
  #12 0x00007fb9cc7ce04d in thread_safe_cast<optree::PyTreeSpec&> (handle=...)
      at /home/runner/work/optree/optree/include/optree/synchronization.h:182
  No locals.
  #13 0x00007fb9cc7f5876 in optree::PyTreeSpec::PyTpTraverse (self_base=0x20002ce0630, 
      visit=0x7fb9ceba61c2 <visit_decref>, arg=0x0)
      at /home/runner/work/optree/optree/src/treespec/gc.cpp:40
          self = <optimized out>
          __PRETTY_FUNCTION__ = "static int optree::PyTreeSpec::PyTpTraverse(PyObject*, visitproc, void*)"

See also:

https://github.com/pybind/pybind11/blob/master/docs/advanced/classes.rst#custom-type-setup

py::cast<CppClass&> is used in tp_traverse and tp_clear, while py::cast is using internals.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuehaiPan could you help by extracting a reproducer from your situation? That'll be worth gold. Without reproducers these mind-bending issues will just keep coming back.

Copy link
Copy Markdown
Collaborator Author

@b-pass b-pass Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this same stack with -fno-omit-frame-pointer might also help?

My guess is that the py::cast failed and threw an exception because the type is no longer registered with internals (because internals has already destructed).

This is not a use-after-free segv, this is an unhandled/uncaught exception.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found that a version bump is necessary for the internal version (used in the key in the interpreter state dict).

In [1]: import env

In [2]: from pybind11_tests import custom_type_setup as m

In [3]: import torch

In [4]: obj = m.OwnsPythonObjects()

In [5]: obj.value = obj

In [6]: exit
[1]    85789 segmentation fault  ipython

torch is built against the old Pybind11 with the same internal version.

Copy link
Copy Markdown
Contributor

@XuehaiPan XuehaiPan Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that internals and local_internals are freed by pp->reset() before py::cast is called during garbage collection. The sequence of events is:

  1. On interpreter shutdown, everything is garbage collected, including user objects and the capsule that holds internals. However, the GC order is not guaranteed.
  2. The capsule destructor calls pp->reset(), which destroys the internals/local_internals content (the unique_ptr now holds nullptr).
  3. During GC, tp_traverse/tp_clear is invoked on pybind11-wrapped objects, which calls py::cast.
  4. py::cast calls get_type_info, which calls get_internals().
  5. get_internals() finds that *pp is nullptr and creates a new empty internals (with no type registrations).
  6. Type lookup fails, py::cast throws cast_error, and the program terminates.

Each interpreter must not have more than two internals. Now we have an unexpected extra empty internals.


Here is my patch:

diff --git a/include/pybind11/detail/internals.h b/include/pybind11/detail/internals.h
index e2dee77e..26cc8cb7 100644
--- a/include/pybind11/detail/internals.h
+++ b/include/pybind11/detail/internals.h
@@ -20,6 +20,7 @@
 #include <atomic>
 #include <cstdint>
 #include <exception>
+#include <fstream>
 #include <limits>
 #include <mutex>
 #include <thread>
@@ -294,6 +295,11 @@ struct internals {
     internals()
         : static_property_type(make_static_property_type()),
           default_metaclass(make_default_metaclass()) {
+        // Debug logging
+        {
+            std::ofstream dbg("debug.txt", std::ios::app);
+            dbg << "internals::internals() ctor called, this=" << this << "\n";
+        }
         tstate.set(nullptr); // See PR #5870
         PyThreadState *cur_tstate = PyThreadState_Get();
 
@@ -317,6 +323,11 @@ struct internals {
     internals &operator=(const internals &other) = delete;
     internals &operator=(internals &&other) = delete;
     ~internals() {
+        // Debug logging
+        {
+            std::ofstream dbg("debug.txt", std::ios::app);
+            dbg << "internals::~internals() dtor called, this=" << this << "\n";
+        }
         // Normally this destructor runs during interpreter finalization and it may DECREF things.
         // In odd finalization scenarios it might end up running after the interpreter has
         // completely shut down, In that case, we should not decref these objects because pymalloc

Output:

internals::internals() ctor called, this=0xca4c508c0    # Original created
internals::~internals() dtor called, this=0xca4c508c0   # Destroyed by pp->reset()
internals::internals() ctor called, this=0xca1e3f480    # NEW empty one created!

Copy link
Copy Markdown
Collaborator Author

@b-pass b-pass Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected behavior of this change whenever something accesses internals late during interpreter shutdown.

We could do a PyGC_Collect() before releasing the internals to try to make this a little less likely, but it could certainly still happen.

We could decide to back out some of this change and stay with a slightly more leaky behavior of pybind11 internals. We did not address all of the leaks yet, default_metaclass is still leaking because it doesn't implement tp_traverse.

auto result = atomic_get_or_create_in_state_dict<std::unique_ptr<InternalsType>>(
holder_id_, /*clear_destructor=*/true);
holder_id_, &internals_shutdown);
auto *pp = result.first;
bool created = result.second;
// Only call on_fetch_ when fetching existing internals, not when creating new ones.
Expand Down
4 changes: 3 additions & 1 deletion include/pybind11/gil_safe_call_once.h
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,9 @@ class gil_safe_call_once_and_store {
// Get or create per-storage capsule in the current interpreter's state dict.
// The storage is interpreter-dependent and will not be shared across interpreters.
storage_type *get_or_create_storage_in_state_dict() {
return detail::atomic_get_or_create_in_state_dict<storage_type>(get_storage_key().c_str())
return detail::atomic_get_or_create_in_state_dict<storage_type>(
get_storage_key().c_str(),
[](void *ptr) -> void { delete static_cast<storage_type *>(ptr); })
.first;
}

Expand Down
Loading