Skip to content

Commit 10f8708

Browse files
b-passpre-commit-ci[bot]rwgkSkylion007
authored
Change function calls to use vectorcall (#5948)
* Make argument_vector re-usable for other types. * Attempt to collect args into array for vectorcall * Revert "Attempt to collect args into array for vectorcall" This reverts commit 418a034. * Implement vectorcall args collector * pre-commit fixes * Checkpoint in moving to METH_FASTCALL * pre-commit fixes * Use the names tuple directly, cleaner code and less reference counting * Fix unit test, the code now holds more references It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member. * Make clangtidy happy * Oops, _v is C++14 * style: pre-commit fixes * Minor code cleanup * Fix signed conversions * Fix args expansion This would be easier with `if constexpr` * style: pre-commit fixes * Code cleanup * fix(tests): Install multiple-interpreter test modules into wheel The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and `mod_per_interpreter_gil_with_singleton` modules were being built but not installed into the wheel when using scikit-build-core (SKBUILD=true). This caused iOS (and potentially Android) CIBW tests to fail with ModuleNotFoundError. Root cause analysis: - The main test targets have install() commands (line 531) - The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing equivalent install() commands - For regular CMake builds, this wasn't a problem because LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests - For wheel builds, only targets with explicit install() commands are included in the wheel This issue was latent until commit fee2527 changed the test imports from `pytest.importorskip()` (graceful skip) to direct `import` statements (hard failure), which exposed the missing modules. Failing tests: - test_multiple_interpreters.py::test_independent_subinterpreters - test_multiple_interpreters.py::test_dependent_subinterpreters Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil' * tests: Pin numpy 2.4.0 for Python 3.14 CI tests Add numpy==2.4.0 requirement for Python 3.14 (both default and free-threaded builds). NumPy 2.4.0 is the first version to provide official PyPI wheels for Python 3.14: - numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default) - numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded) Previously, CI was skipping all numpy-dependent tests for Python 3.14 because PIP_ONLY_BINARY was set and no wheels were available: SKIPPED [...] test_numpy_array.py:8: could not import 'numpy': No module named 'numpy' With this change, the full numpy test suite will run on Python 3.14, providing better test coverage for the newest Python version. Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0) to ensure reproducible CI results with the first known-working version. * tests: Add verbose flag to CIBW pytest command Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging. * tests: Skip subinterpreter tests on iOS, add pytest timeout - Add `IOS` platform constant to `tests/env.py` for consistency with existing `ANDROID`, `LINUX`, `MACOS`, `WIN`, `FREEBSD` constants. - Skip `test_multiple_interpreters.py` module on iOS. Subinterpreters are not supported in the iOS simulator environment. These tests were previously skipped implicitly because the modules weren't installed in the wheel; now that they are (commit 6ed6d5a), we need an explicit skip. - Change pytest timeout from 0 (disabled) to 120 seconds. This provides a safety net to catch hanging tests before the CI job times out after hours. Normal test runs complete in 33-55 seconds total (~1100 tests), so 120 seconds per test is very generous. - Add `-v` flag for verbose output to help diagnose any future issues. * More cleanups in argument vector, per comments. * Per Cursor, move all versions to Vectorcall since it has been supported since 3.8. This means getting rid of simple_collector, we can do the same with a constexpr if in the unpacking_collector. * Switch to a bool vec for the used_kwargs flag... This makes more sense and saves a sort, and the small_vector implementation means it will actually take less space than a vector of size_t elements. The most common case is that all kwargs are used. * Fix signedness for clang * Another signedness issue * tests: Disable pytest-timeout for Pyodide (no signal.setitimer) Pyodide runs in a WebAssembly sandbox without POSIX signals, so `signal.setitimer` is not available. This causes pytest-timeout to crash with `AttributeError: module 'signal' has no attribute 'setitimer'` when timeout > 0. Override the test-command for Pyodide to keep timeout=0 (disabled). * Combine temp storage and args into one vector It's a good bit faster at the cost of this one scary reinterpret_cast. * Phrasing * Delete incorrect comment At 6, the struct is 144 bytes (not 128 bytes as the comment said). * Fix push_back * Update push_back in argument_vector.h Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> * style: pre-commit fixes * Use real types for these instead of object They can be null if you "steal" a null handle. * refactor: Replace small_vector<object> with ref_small_vector for explicit ownership Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`, replacing the previous `small_vector<object>` approach. Primary goals: 1. **Maintainability**: The previous implementation relied on `sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to pass the object array to vectorcall. This coupling to py::object's internal layout could break if someone adds a debug field or other member to py::handle/py::object in the future. 2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()` API makes reference counting intent explicit at each call site, rather than relying on implicit py::object semantics. 3. **Intuitive code**: Storing `PyObject*` directly and passing it to `_PyObject_Vectorcall` without casts is straightforward and matches what the C API expects. No "scary" reinterpret_cast needed. Additional benefits: - `PyObject*` is trivially copyable, simplifying vector operations - Batch decref in destructor (tight loop vs N individual object destructors) - Self-documenting ownership semantics Design consideration: We considered folding the ref-counting functionality directly into `small_vector` via template specialization for `PyObject*`. We decided against this because: - It would give `small_vector<PyObject*, N>` a different interface than the generic `small_vector<T, N>` (steal/borrow vs push_back) - Someone might want a non-ref-counting `small_vector<PyObject*, N>` - The specialization behavior could surprise users expecting uniform semantics A separate `ref_small_vector` type makes the ref-counting behavior explicit and self-documenting, while keeping `small_vector` generic and predictable. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
1 parent c761608 commit 10f8708

4 files changed

Lines changed: 345 additions & 222 deletions

File tree

include/pybind11/cast.h

Lines changed: 108 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -2078,8 +2078,7 @@ using is_pos_only = std::is_same<intrinsic_t<T>, pos_only>;
20782078
// forward declaration (definition in attr.h)
20792079
struct function_record;
20802080

2081-
/// (Inline size chosen mostly arbitrarily; 6 should pad function_call out to two cache lines
2082-
/// (16 pointers) in size.)
2081+
/// Inline size chosen mostly arbitrarily.
20832082
constexpr std::size_t arg_vector_small_size = 6;
20842083

20852084
/// Internal data associated with a single function call
@@ -2191,94 +2190,130 @@ class argument_loader {
21912190
std::tuple<make_caster<Args>...> argcasters;
21922191
};
21932192

2194-
/// Helper class which collects only positional arguments for a Python function call.
2195-
/// A fancier version below can collect any argument, but this one is optimal for simple calls.
2193+
// [workaround(intel)] Separate function required here
2194+
// We need to put this into a separate function because the Intel compiler
2195+
// fails to compile enable_if_t<!all_of<is_positional<Args>...>::value>
2196+
// (tested with ICC 2021.1 Beta 20200827).
2197+
template <typename... Args>
2198+
constexpr bool args_has_keyword_or_ds() {
2199+
return any_of<is_keyword_or_ds<Args>...>::value;
2200+
}
2201+
2202+
/// Helper class which collects positional, keyword, * and ** arguments for a Python function call
21962203
template <return_value_policy policy>
2197-
class simple_collector {
2204+
class unpacking_collector {
21982205
public:
21992206
template <typename... Ts>
2200-
explicit simple_collector(Ts &&...values)
2201-
: m_args(pybind11::make_tuple<policy>(std::forward<Ts>(values)...)) {}
2202-
2203-
const tuple &args() const & { return m_args; }
2204-
dict kwargs() const { return {}; }
2207+
explicit unpacking_collector(Ts &&...values)
2208+
: m_names(reinterpret_steal<tuple>(
2209+
handle())) // initialize to null to avoid useless allocation of 0-length tuple
2210+
{
2211+
/*
2212+
Python can sometimes utilize an extra space before the arguments to prepend `self`.
2213+
This is important enough that there is a special flag for it:
2214+
PY_VECTORCALL_ARGUMENTS_OFFSET.
2215+
All we have to do is allocate an extra space at the beginning of this array, and set the
2216+
flag. Note that the extra space is not passed directly in to vectorcall.
2217+
*/
2218+
m_args.reserve(sizeof...(values) + 1);
2219+
m_args.push_back_null();
2220+
2221+
if (args_has_keyword_or_ds<Ts...>()) {
2222+
list names_list;
2223+
2224+
// collect_arguments guarantees this can't be constructed with kwargs before the last
2225+
// positional so we don't need to worry about Ts... being in anything but normal python
2226+
// order.
2227+
using expander = int[];
2228+
(void) expander{0, (process(names_list, std::forward<Ts>(values)), 0)...};
2229+
2230+
m_names = reinterpret_steal<tuple>(PyList_AsTuple(names_list.ptr()));
2231+
} else {
2232+
auto not_used
2233+
= reinterpret_steal<list>(handle()); // initialize as null (to avoid an allocation)
22052234

2206-
tuple args() && { return std::move(m_args); }
2235+
using expander = int[];
2236+
(void) expander{0, (process(not_used, std::forward<Ts>(values)), 0)...};
2237+
}
2238+
}
22072239

22082240
/// Call a Python function and pass the collected arguments
22092241
object call(PyObject *ptr) const {
2210-
PyObject *result = PyObject_CallObject(ptr, m_args.ptr());
2242+
size_t nargs = m_args.size() - 1; // -1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
2243+
if (m_names) {
2244+
nargs -= m_names.size();
2245+
}
2246+
PyObject *result = _PyObject_Vectorcall(
2247+
ptr, m_args.data() + 1, nargs | PY_VECTORCALL_ARGUMENTS_OFFSET, m_names.ptr());
22112248
if (!result) {
22122249
throw error_already_set();
22132250
}
22142251
return reinterpret_steal<object>(result);
22152252
}
22162253

2217-
private:
2218-
tuple m_args;
2219-
};
2220-
2221-
/// Helper class which collects positional, keyword, * and ** arguments for a Python function call
2222-
template <return_value_policy policy>
2223-
class unpacking_collector {
2224-
public:
2225-
template <typename... Ts>
2226-
explicit unpacking_collector(Ts &&...values) {
2227-
// Tuples aren't (easily) resizable so a list is needed for collection,
2228-
// but the actual function call strictly requires a tuple.
2229-
auto args_list = list();
2230-
using expander = int[];
2231-
(void) expander{0, (process(args_list, std::forward<Ts>(values)), 0)...};
2232-
2233-
m_args = std::move(args_list);
2254+
tuple args() const {
2255+
size_t nargs = m_args.size() - 1; // -1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
2256+
if (m_names) {
2257+
nargs -= m_names.size();
2258+
}
2259+
tuple val(nargs);
2260+
for (size_t i = 0; i < nargs; ++i) {
2261+
// +1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
2262+
val[i] = reinterpret_borrow<object>(m_args[i + 1]);
2263+
}
2264+
return val;
22342265
}
22352266

2236-
const tuple &args() const & { return m_args; }
2237-
const dict &kwargs() const & { return m_kwargs; }
2238-
2239-
tuple args() && { return std::move(m_args); }
2240-
dict kwargs() && { return std::move(m_kwargs); }
2241-
2242-
/// Call a Python function and pass the collected arguments
2243-
object call(PyObject *ptr) const {
2244-
PyObject *result = PyObject_Call(ptr, m_args.ptr(), m_kwargs.ptr());
2245-
if (!result) {
2246-
throw error_already_set();
2267+
dict kwargs() const {
2268+
dict val;
2269+
if (m_names) {
2270+
size_t offset = m_args.size() - m_names.size();
2271+
for (size_t i = 0; i < m_names.size(); ++i, ++offset) {
2272+
val[m_names[i]] = reinterpret_borrow<object>(m_args[offset]);
2273+
}
22472274
}
2248-
return reinterpret_steal<object>(result);
2275+
return val;
22492276
}
22502277

22512278
private:
2279+
// normal argument, possibly needing conversion
22522280
template <typename T>
2253-
void process(list &args_list, T &&x) {
2254-
auto o = reinterpret_steal<object>(
2255-
detail::make_caster<T>::cast(std::forward<T>(x), policy, {}));
2256-
if (!o) {
2281+
void process(list & /*names_list*/, T &&x) {
2282+
handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {});
2283+
if (!h) {
22572284
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
2258-
throw cast_error_unable_to_convert_call_arg(std::to_string(args_list.size()));
2285+
throw cast_error_unable_to_convert_call_arg(std::to_string(m_args.size() - 1));
22592286
#else
2260-
throw cast_error_unable_to_convert_call_arg(std::to_string(args_list.size()),
2287+
throw cast_error_unable_to_convert_call_arg(std::to_string(m_args.size() - 1),
22612288
type_id<T>());
22622289
#endif
22632290
}
2264-
args_list.append(std::move(o));
2291+
m_args.push_back_steal(h.ptr()); // cast returns a new reference
22652292
}
22662293

2267-
void process(list &args_list, detail::args_proxy ap) {
2294+
// * unpacking
2295+
void process(list & /*names_list*/, detail::args_proxy ap) {
2296+
if (!ap) {
2297+
return;
2298+
}
22682299
for (auto a : ap) {
2269-
args_list.append(a);
2300+
m_args.push_back_borrow(a.ptr());
22702301
}
22712302
}
22722303

2273-
void process(list & /*args_list*/, arg_v a) {
2304+
// named argument
2305+
// NOLINTNEXTLINE(performance-unnecessary-value-param)
2306+
void process(list &names_list, arg_v a) {
2307+
assert(names_list);
22742308
if (!a.name) {
22752309
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
22762310
nameless_argument_error();
22772311
#else
22782312
nameless_argument_error(a.type);
22792313
#endif
22802314
}
2281-
if (m_kwargs.contains(a.name)) {
2315+
auto name = str(a.name);
2316+
if (names_list.contains(name)) {
22822317
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
22832318
multiple_values_error();
22842319
#else
@@ -2292,22 +2327,27 @@ class unpacking_collector {
22922327
throw cast_error_unable_to_convert_call_arg(a.name, a.type);
22932328
#endif
22942329
}
2295-
m_kwargs[a.name] = std::move(a.value);
2330+
names_list.append(std::move(name));
2331+
m_args.push_back_borrow(a.value.ptr());
22962332
}
22972333

2298-
void process(list & /*args_list*/, detail::kwargs_proxy kp) {
2334+
// ** unpacking
2335+
void process(list &names_list, detail::kwargs_proxy kp) {
22992336
if (!kp) {
23002337
return;
23012338
}
2302-
for (auto k : reinterpret_borrow<dict>(kp)) {
2303-
if (m_kwargs.contains(k.first)) {
2339+
assert(names_list);
2340+
for (auto &&k : reinterpret_borrow<dict>(kp)) {
2341+
auto name = str(k.first);
2342+
if (names_list.contains(name)) {
23042343
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
23052344
multiple_values_error();
23062345
#else
2307-
multiple_values_error(str(k.first));
2346+
multiple_values_error(name);
23082347
#endif
23092348
}
2310-
m_kwargs[k.first] = k.second;
2349+
names_list.append(std::move(name));
2350+
m_args.push_back_borrow(k.second.ptr());
23112351
}
23122352
}
23132353

@@ -2333,39 +2373,20 @@ class unpacking_collector {
23332373
}
23342374

23352375
private:
2336-
tuple m_args;
2337-
dict m_kwargs;
2376+
ref_small_vector<arg_vector_small_size> m_args;
2377+
tuple m_names;
23382378
};
23392379

2340-
// [workaround(intel)] Separate function required here
2341-
// We need to put this into a separate function because the Intel compiler
2342-
// fails to compile enable_if_t<!all_of<is_positional<Args>...>::value>
2343-
// (tested with ICC 2021.1 Beta 20200827).
2344-
template <typename... Args>
2345-
constexpr bool args_are_all_positional() {
2346-
return all_of<is_positional<Args>...>::value;
2347-
}
2348-
2349-
/// Collect only positional arguments for a Python function call
2350-
template <return_value_policy policy,
2351-
typename... Args,
2352-
typename = enable_if_t<args_are_all_positional<Args...>()>>
2353-
simple_collector<policy> collect_arguments(Args &&...args) {
2354-
return simple_collector<policy>(std::forward<Args>(args)...);
2355-
}
2356-
2357-
/// Collect all arguments, including keywords and unpacking (only instantiated when needed)
2358-
template <return_value_policy policy,
2359-
typename... Args,
2360-
typename = enable_if_t<!args_are_all_positional<Args...>()>>
2380+
/// Collect all arguments, including keywords and unpacking
2381+
template <return_value_policy policy, typename... Args>
23612382
unpacking_collector<policy> collect_arguments(Args &&...args) {
23622383
// Following argument order rules for generalized unpacking according to PEP 448
2363-
static_assert(constexpr_last<is_positional, Args...>()
2364-
< constexpr_first<is_keyword_or_ds, Args...>()
2365-
&& constexpr_last<is_s_unpacking, Args...>()
2366-
< constexpr_first<is_ds_unpacking, Args...>(),
2367-
"Invalid function call: positional args must precede keywords and ** unpacking; "
2368-
"* unpacking must precede ** unpacking");
2384+
static_assert(
2385+
constexpr_last<is_positional, Args...>() < constexpr_first<is_keyword_or_ds, Args...>(),
2386+
"Invalid function call: positional args must precede keywords and */** unpacking;");
2387+
static_assert(constexpr_last<is_s_unpacking, Args...>()
2388+
< constexpr_first<is_ds_unpacking, Args...>(),
2389+
"Invalid function call: * unpacking must precede ** unpacking");
23692390
return unpacking_collector<policy>(std::forward<Args>(args)...);
23702391
}
23712392

0 commit comments

Comments
 (0)