WIP: Add `cuda.core.utils.make_aligned_dtype` by leofang · Pull Request #1636 · NVIDIA/cuda-python

leofang · 2026-02-18T07:05:17Z

Description

closes #734.

Credit goes entirely to @seberg (see #734 and cupy/cupy#9650) 🙂

DO NOT REVIEW yet. Need to address some details.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>

copy-pr-bot · 2026-02-18T07:05:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpcloud · 2026-03-04T15:12:38Z

+                    subalignment = subdtype.metadata.get(
+                        "__cuda_alignment__", subalignment)
+            else:
+                subdtype = make_aligned_dtype(subdtype, recurse=recurse)


When recurse=True, we recurse into nested structured dtypes before honoring existing __cuda_alignment__ metadata. That can reject already-prepared nested dtypes (with intentionally padded itemsize) via Input descriptor had larger itemsize than inferred.

Could we mirror the non-recursive path and first respect __cuda_alignment__ if present, then recurse only when metadata is absent?

Yeah, I changed this pattern slightly in the cupy PR (I suspect that addresses this).

cpcloud · 2026-03-04T15:12:46Z

+
+            min_offset = offset + subdtype.itemsize
+
+            if subdtype.names is None:


subdtype.names is None also covers subarray / unstructured-void fields. _get_cuda_scalar_alignment() then infers alignment from total itemsize, which can over-align (e.g. ('f4', 3) has size 12 but element alignment 4).

That can silently change field offsets and ABI layout. Should we detect subarrays (subdtype.subdtype) and align from the base dtype (or explicitly reject subarrays until supported)?

I'll fix this in the cupy PR. (I may have forgotten about it, because the rest of CuPy in that PR doesn't support subarrays.)

leofang · 2026-04-07T19:51:03Z

Pushing this out...

initial check-in

df811b9

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>

leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Feb 18, 2026

Andy-Jost added this to the cuda.core v0.7.0 milestone Feb 18, 2026

rparolin requested a review from cpcloud February 18, 2026 22:04

rparolin assigned leofang Feb 18, 2026

cpcloud reviewed Mar 4, 2026

View reviewed changes

leofang modified the milestones: cuda.core v0.7.0, cuda.core v1.0.0 Apr 7, 2026

leofang added P1 Medium priority - Should do and removed P0 High priority - Must do! labels May 1, 2026

leofang modified the milestones: cuda.core v1.0.0, cuda.core next May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Add `cuda.core.utils.make_aligned_dtype`#1636

WIP: Add `cuda.core.utils.make_aligned_dtype`#1636
leofang wants to merge 1 commit into
NVIDIA:mainfrom
leofang:aligned_dtype

leofang commented Feb 18, 2026

Uh oh!

copy-pr-bot Bot commented Feb 18, 2026

Uh oh!

cpcloud Mar 4, 2026

Uh oh!

seberg Mar 4, 2026

Uh oh!

cpcloud Mar 4, 2026

Uh oh!

seberg Mar 4, 2026

Uh oh!

leofang commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		min_offset = offset + subdtype.itemsize

		if subdtype.names is None:

Uh oh!

Conversation

leofang commented Feb 18, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Feb 18, 2026

Uh oh!

cpcloud Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

seberg Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

cpcloud Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

seberg Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

leofang commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants