WIP: Add cuda.core.utils.make_aligned_dtype#1636
Conversation
Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
| subalignment = subdtype.metadata.get( | ||
| "__cuda_alignment__", subalignment) | ||
| else: | ||
| subdtype = make_aligned_dtype(subdtype, recurse=recurse) |
There was a problem hiding this comment.
When recurse=True, we recurse into nested structured dtypes before honoring existing __cuda_alignment__ metadata. That can reject already-prepared nested dtypes (with intentionally padded itemsize) via Input descriptor had larger itemsize than inferred.
Could we mirror the non-recursive path and first respect __cuda_alignment__ if present, then recurse only when metadata is absent?
There was a problem hiding this comment.
Yeah, I changed this pattern slightly in the cupy PR (I suspect that addresses this).
|
|
||
| min_offset = offset + subdtype.itemsize | ||
|
|
||
| if subdtype.names is None: |
There was a problem hiding this comment.
subdtype.names is None also covers subarray / unstructured-void fields. _get_cuda_scalar_alignment() then infers alignment from total itemsize, which can over-align (e.g. ('f4', 3) has size 12 but element alignment 4).
That can silently change field offsets and ABI layout. Should we detect subarrays (subdtype.subdtype) and align from the base dtype (or explicitly reject subarrays until supported)?
There was a problem hiding this comment.
I'll fix this in the cupy PR. (I may have forgotten about it, because the rest of CuPy in that PR doesn't support subarrays.)
Description
closes #734.
Credit goes entirely to @seberg (see #734 and cupy/cupy#9650) 🙂
DO NOT REVIEW yet. Need to address some details.
Checklist