Skip to content

Collapse #[repr(transparent)] newtypes in SPIR-V type translation#566

Open
novacrazy wants to merge 2 commits intoRust-GPU:mainfrom
raygon-renderer:main
Open

Collapse #[repr(transparent)] newtypes in SPIR-V type translation#566
novacrazy wants to merge 2 commits intoRust-GPU:mainfrom
raygon-renderer:main

Conversation

@novacrazy
Copy link
Copy Markdown

Motivation

When a #[repr(transparent)] struct wraps a SPIR-V vector, rust-gpu currently wraps the inner type in an OpTypeStruct, causing downstream usage to scalarize every operation, along with lots of composite extract and construct operations. This happened in a project of mine that works by defining low-level "Register" types (originally for CPU SIMD registers, but now also SPIR-V vectors), then wrapping them in a high-level Vector<R: Register>(Storage<R>) struct to implement generic traits and operator overloads on. My entire library ecosystem depends on #[repr(transparent)] working as intended, and rust-gpu was failing that.

Full disclosure, this is a hobby project of mine, and I didn't have time to dig into rust-gpu myself deeply. I used Claude to find the problem area and refine it into what it is now. The rest of this PR describing the changes is written by it. Do note I am not an inept "vibe coder", so if anything does need fixing I can look into it, but this does work for at least my library.

Change

In TyAndLayout::spirv_type, when the outer layout has BackendRepr::Memory and the Rust type is #[repr(transparent)] with exactly one non-ZST field matching the outer's size/align/offset, reuse the inner field's SPIR-V type directly instead of emitting a wrapping OpTypeStruct.

The structural check is the same newtype-unpacking logic already used for BackendRepr::ScalarPair - factored into a shared sole_structural_newtype_field helper. The two call sites differ only in how they prove ABI identity:

  • ScalarPair uses BackendRepr::eq_up_to_validity.
  • Memory relies on #[repr(transparent)], which guarantees full ABI identity by construction. The reprs themselves won't match here (outer is Memory, inner may be anything), so the repr check doesn't apply.

Guardrails

  • Only collapses #[repr(transparent)] types. Opaque single-field structs like struct Viewport { rect: Vec4 } keep their OpTypeStruct identity (important for push-constant Block declarations, interface matching, OpMemberDecorate).
  • Refuses to collapse if the wrapper carries #[spirv(block)], since that decoration needs an OpTypeStruct to land on. block is the only type-level SPIR-V attribute at present (besides intrinsic_type, which short-circuits earlier); if more are added, they'll need adding to this guard.
  • The wrapper's name is preserved via OpName on the collapsed inner id (new name_type_id helper), so disassembly still shows Vector, Position, etc. The existing per-(id, name_key) dedup already supported multiple names per id, so this piggybacks on the same mechanism.

Incidental

Transparent unions (#[repr(transparent)] union { a: T, b: () }) also collapse correctly, since the new guard sits at the BackendRepr level and doesn't care about FieldsShape. Previously those went through the "largest case" path in trans_aggregate, which happened to produce the same result when alignments matched but wrapped otherwise.

Copy link
Copy Markdown
Collaborator

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Don't worry about AI use, thank you for calling it out. The compiletests are failing.

@novacrazy
Copy link
Copy Markdown
Author

Seems to be a (good) side effect of this change. NeverShortCircuit from core/ops/try_trait.rs is marked as #[repr(transparent)] and is used somewhere in the code for that shader, and now it gets a name as well, offsetting the op numbers. I'll update the .stderr file. Slightly odd that was the only test that showed differences, actually.

I've found in my own testing that for loops are pretty unreliable with SPIR-V. Iterators and Range aren't lowered or unrolled well, kind of similar to opt-level=0 kind of output, even though everything else around them is fine.

@novacrazy
Copy link
Copy Markdown
Author

Tests should pass now.

@LegNeato
Copy link
Copy Markdown
Collaborator

LegNeato commented Apr 13, 2026

Yeah the codegen is not sophisticated at all. We've been discussing the best way to make it better.

Copy link
Copy Markdown
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, but I do want to add some disassembly tests to check it actually does the correct thing.

I do not like our codegen type system at all, our own sizeof() function is just heresy waiting to blow up at some point... I would like to burn it all down, whenever I find the time to build a suitable replacement.

Comment on lines +479 to +485
// We omit the `eq_up_to_validity` check on `backend_repr`: the
// outer wrapper always has `BackendRepr::Memory` here while the
// inner field may have a different repr (e.g. a type annotated
// with `#[rust_gpu::spirv(vector)]` goes through the intrinsic
// path above and returns an `OpTypeVector`). The reprs will
// never match, but `#[repr(transparent)]` guarantees full ABI
// identity, so the structural checks in the helper suffice.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is AI BS, if you add eq_up_to_validity here you'll get the same result. In general, this patch just allows #[transparent] to inline glam vecs (rather #[rust_gpu::vector::v1] annotated structs) from what I can tell.

My entire library ecosystem depends on #[repr(transparent)] working as intended, and rust-gpu was failing that.

Could you explain what exactly fails? Since this doesn't seem to be about optimization, but about something failing compilation in a certain context?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also #[spirv(Block)] is deprecated and doesn't need to be checked

@Firestar99
Copy link
Copy Markdown
Member

Firestar99 commented Apr 14, 2026

I don't seem to have permissions to push to this pull request, so I pushed my changes to https://github.com/Rust-GPU/rust-gpu/tree/collapse-repr-transparent. Also includes a commit that fixes ; escaping in disassembly, since I saw the updated disassembly being weird about rust array declarations ([Type; N]).

I would still like @eddyb to have a another look over this, cause screwing up ABI in suttle ways is really bad (we may also want to delay the merge until we release the alpha this week)

@novacrazy
Copy link
Copy Markdown
Author

@Firestar99 In my library, I define SPIR-V vectors as my "Registers", then use them in a Vector<R>(Storage<R>) type that wraps all the inner operations in high-level traits and such. Before this, I was sometimes running into rust-gpu scalarizing every vector operation. Unpacking a f32x4 vector, running four scalar ops, then packing it back into the struct. So my math library built on Vector was producing SPIR-V that wasn't using any vector ops, extremely inefficient.

However, not always, and now I'm actually having trouble reproducing it exactly on the same version of rust-gpu I had before. Weird. I remember something about rust-gpu erroring because I wasn't using a direct SPIR-V type for shader inputs, but now that I try to reproduce it it's not complaining. Something I changed along the way must have accidentally helped. I just know that at the time, the changes to #[repr(transparent)] handling did indeed fix it even in whatever degenerate case happened then.

This change still reduces extra wrapping and access chains, though, even in the best cases, and fixed the weird codegen I experienced before even if I can't replicate it.

@Firestar99
Copy link
Copy Markdown
Member

So my math library built on Vector was producing SPIR-V that wasn't using any vector ops, extremely inefficient.

And then the drivers see a vector op and "scalarizes" it again into 4 add ops (V_ADD_F32 on AMD RDNA), and runs these 4 add ops on a 32-line SIMD. Vector inserts and extracts are free since drivers have to do register allocation where memory moves fall away as well. So from a performance standpoint, it doesn't matter. (Except maybe on some older ARM GPUs that do run a vector op like SIMD, see this presentation about BiFrost moving away from SIMD cores to scalar ones, AMD abandoned that with Terrascale -> GCN in 2012).

So vectors actually being vectors is mostly a validation thing, as some instrinsics and built-ins do require actual vectors (or scalars) to operate on. That requirement is also being softened with ScalarComposite automatically decomposing your types into scalars, so there's not much left that would actually require a vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants