Skip to content

WIP: Add cuda.core.utils.make_aligned_dtype#1636

Draft
leofang wants to merge 1 commit intoNVIDIA:mainfrom
leofang:aligned_dtype
Draft

WIP: Add cuda.core.utils.make_aligned_dtype#1636
leofang wants to merge 1 commit intoNVIDIA:mainfrom
leofang:aligned_dtype

Conversation

@leofang
Copy link
Member

@leofang leofang commented Feb 18, 2026

Description

closes #734.

Credit goes entirely to @seberg (see #734 and cupy/cupy#9650) 🙂

DO NOT REVIEW yet. Need to address some details.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
@leofang leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Feb 18, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Feb 18, 2026
@rparolin rparolin requested a review from cpcloud February 18, 2026 22:04
subalignment = subdtype.metadata.get(
"__cuda_alignment__", subalignment)
else:
subdtype = make_aligned_dtype(subdtype, recurse=recurse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When recurse=True, we recurse into nested structured dtypes before honoring existing __cuda_alignment__ metadata. That can reject already-prepared nested dtypes (with intentionally padded itemsize) via Input descriptor had larger itemsize than inferred.

Could we mirror the non-recursive path and first respect __cuda_alignment__ if present, then recurse only when metadata is absent?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I changed this pattern slightly in the cupy PR (I suspect that addresses this).


min_offset = offset + subdtype.itemsize

if subdtype.names is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subdtype.names is None also covers subarray / unstructured-void fields. _get_cuda_scalar_alignment() then infers alignment from total itemsize, which can over-align (e.g. ('f4', 3) has size 12 but element alignment 4).

That can silently change field offsets and ABI layout. Should we detect subarrays (subdtype.subdtype) and align from the base dtype (or explicitly reject subarrays until supported)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in the cupy PR. (I may have forgotten about it, because the rest of CuPy in that PR doesn't support subarrays.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cuda.core.utils.get_aligned_dtype

4 participants