Skip to content

Reference Counting (Use-After-Free) Bugs for PyList_SetItem in SparseCSFTensorToNdarray #49915

@wr-web

Description

@wr-web

Remove incorrect Py_XDECREF call for item.

Related Code

RETURN_NOT_OK(TensorToNdarray(sparse_index.indptr()[i], base, &item));
if (PyList_SetItem(indptr.obj(), i, item) < 0) {
Py_XDECREF(item);

RETURN_NOT_OK(TensorToNdarray(sparse_index.indices()[i], base, &item));
if (PyList_SetItem(indices.obj(), i, item) < 0) {
Py_XDECREF(item);

PyList_SetItem Source Code

PyList_SetItem Document

PyList_SetItem Python Forum Discussion

Therefore, whether the function succeeds or fails, it will steal a reference count of the third argument. It must not call Py_XDECREF in case of failure.

Similar issue/PR:

PaddlePaddle/Paddle#77447

apache/geaflow#725

Reference Counting and Ownership in CPython Native API

Borrowed Reference

A borrowed reference is a reference obtained from an object that you don't own. You don't need to decrement its reference count when you're done with it, but you must ensure the object stays alive while you're using it (e.g., by creating an owned reference with Py_INCREF if necessary). Borrowed references are typically returned by functions like PyList_GetItem(), which returns an item from a list without incrementing its reference count.

New Reference

A new reference (also called an "owned reference") is a reference that you have ownership of. When you receive a new reference from a function (such as PyObject_New() or Py_BuildValue()), you are responsible for calling Py_DECREF() on it when you no longer need it to properly decrement its reference count. Failure to do so causes memory leaks.

Stolen Reference (Stealing)

A stolen reference is when a function takes ownership of a reference you pass to it. When you pass an object reference to a function that "steals" it, you no longer own that reference, and you should not call Py_DECREF() on it afterward. The function assumes full responsibility for managing the reference count of that object.

Example: PyList_SetItem

PyList_SetItem is a classic example of a reference-stealing function. According to the documentation:

"Set the item at index index in list to item. Return 0 on success. If index is out of bounds, return -1 and set an IndexError exception. Note: This function 'steals' a reference to item and discards a reference to an item already in the list at the affected position."

However, a critical ambiguity in the documentation is that it does not clearly state whether the reference is stolen in the case of function failure. This is fundamentally an all-or-nothing problem: when you pass an object to a stealing function, you must understand whether ownership is unconditionally transferred or only transferred on success.

Looking at the CPython source code clarifies this behavior:

int
PyList_SetItem(PyObject *op, Py_ssize_t i,
               PyObject *newitem)
{
    if (!PyList_Check(op)) {
        Py_XDECREF(newitem);
        PyErr_BadInternalCall();
        return -1;
    }
    // ...
}

As demonstrated in the source code above, PyList_SetItem unconditionally calls Py_XDECREF(newitem) when the type check fails—meaning it always steals the reference, even on failure. The function takes ownership of newitem regardless of whether it successfully inserts the item into the list.

This behavior has serious implications for correct API usage. Consider the following incorrect code:

if (PyList_SetItem(a, b, something) < 0) {
    Py_DECREF(something);  // DANGER: Use-After-Free!
}

This code is defective because it leads to a use-after-free vulnerability. Since PyList_SetItem already stole the reference (and decremented it on failure via Py_XDECREF), the additional Py_DECREF(something) in the error-handling block causes a double decrement, potentially leading to an assertion failure in debug builds or memory corruption and crashes in release builds.

The correct pattern is simply:

if (PyList_SetItem(a, b, something) < 0) {
    // Do NOT call Py_DECREF on 'something' - the reference was already stolen
    return NULL;  // or handle error appropriately
}

In summary, when dealing with stealing functions in the CPython API, you must relinquish all ownership responsibility for the passed reference and never decrement it after the call, regardless of the return value. Always consult the source code or thoroughly documented behavior to confirm whether a function truly provides an all-or-nothing stealing guarantee.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions