Skip to content

Comments

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843

Open
TKanX wants to merge 2 commits intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume
Open

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843
TKanX wants to merge 2 commits intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume

Conversation

@TKanX
Copy link

@TKanX TKanX commented Feb 19, 2026

Summary:

Problem:

size_of_val(p) == 0 fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}

Foo has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because Box<dyn T> drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant Foo<[i32]> already optimized correctly; Foo<dyn Trait> and Foo<[u8]> did not.

Root Cause:

In size_and_align_of_dst (the ADT/Tuple branch), the size computation is:

full_size = (offset + unsized_size + (align-1)) & -align

LLVM cannot prove full_size > 0 because:

  1. offset + unsized_size used plain add — no NUW flag, so LLVM cannot conclude the result is ≥ offset.
  2. (x + addend) & -align — LLVM has no information that alignment rounding never reduces the value below x.

Additionally, the vtable alignment range metadata was [1, u64::MAX] (only non-zero), despite the actual bound being [1, 1 << (ptr_width - 1)] (all alignments are powers of two with a tighter upper bound).

Solution:

Three minimal additions, each grounded in a precise invariant:

  1. add nuw on offset + unsized_size — sound because both operands are ≤ isize::MAX for any valid Rust object, so unsigned overflow is impossible. Tells LLVM: unrounded_size ≥ offset.

  2. assume(full_size ≥ unrounded_size)round_up(x, a) ≥ x is a mathematical identity for power-of-two a. Tells LLVM: full_size ≥ unrounded_size ≥ offset. If offset > 0, the chain proves full_size > 0.

  3. Tighten vtable alignment range from [1, u64::MAX] to [1, 1 << (ptr_width - 1)] — consistent with Rust's alignment constraints. Applied in both size_of_val.rs and the vtable_align intrinsic in mir/intrinsic.rs.

LLVM IR Comparison:

Foo<dyn Debug> — before (godbolt):

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}

Foo<dyn Debug> — after:

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}

Foo<[u8]> — before:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}

Foo<[u8]> — after:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}

Changes:

  • compiler/rustc_codegen_ssa/src/size_of_val.rs: addunchecked_uadd (NUW) on offset + unsized_size; add assume(full_size ≥ unrounded_size); tighten vtable alignment range.
  • compiler/rustc_codegen_ssa/src/mir/intrinsic.rs: tighten alignment range on the vtable_align intrinsic, consistent with the above.
  • tests/codegen-llvm/dst-vtable-align-nonzero.rs: update FileCheck metadata expectation to match the new tighter range.
  • tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifying size_of_val == 0 folds to ret i1 false for Foo<dyn Debug>, Foo<[u8]>, and Foo<[i32]>.

Fixes #152788.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 19, 2026
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@TKanX
Copy link
Author

TKanX commented Feb 20, 2026

@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler

@rustbot rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 20, 2026
@fmease
Copy link
Member

fmease commented Feb 21, 2026

r? codegen

@rustbot rustbot assigned dianqk and unassigned fmease Feb 21, 2026
@rust-bors

This comment has been minimized.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a9ec27f to 8339cfe Compare February 22, 2026 05:32
@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@TKanX
Copy link
Author

TKanX commented Feb 22, 2026

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026
@TKanX TKanX requested a review from scottmcm February 22, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

size_of_val(p) == 0 doesn't optimize out for clearly-not-ZST values

6 participants