-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usage
Description
Consider
class C:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
C(1,2,3)This produces a trace looking something like this:
...
_CHECK_AND_ALLOCATE_OBJECT
_CREATE_INIT_FRAME
_PUSH_FRAME
# Some guards
_LOAD_FAST_BORROW_1
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
# Some more guards
_LOAD_FAST_BORROW_2
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
# Some more guards
_LOAD_FAST_BORROW_3
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
...
Each of those _STORE_ATTR_INSTANCE_VALUE reads the old value out of memory and then conditionally decrefs it.
But in this case we know that the old value was NULL so we can just overwrite it.
So we can replace this:
PyObject **value_ptr = (PyObject**)(((char *)owner_o) + offset);
PyObject *old_value = *value_ptr;
FT_ATOMIC_STORE_PTR_RELEASE(*value_ptr, PyStackRef_AsPyObjectSteal(value));
if (old_value == NULL) {
PyDictValues *values = _PyObject_InlineValues(owner_o);
Py_ssize_t index = value_ptr - values->values;
_PyDictValues_AddToInsertionOrder(values, index);
}
Py_XDECREF(old_value);with this:
PyObject **value_ptr = (PyObject**)(((char *)owner_o) + offset);
FT_ATOMIC_STORE_PTR_RELEASE(*value_ptr, PyStackRef_AsPyObjectSteal(value));
PyDictValues *values = _PyObject_InlineValues(owner_o);
Py_ssize_t index = value_ptr - values->values;
_PyDictValues_AddToInsertionOrder(values, index);On Aarch64, this reduces the number of machine instructions from 48 to 26.
The same reasoning also applies to _STORE_ATTR_SLOT where it reduces the number of machine instructions from 32 to 14.
See also #134584
We can probably remove some of those guards as well, but that's a separate issue.
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usage