Redesign `intrinsic-test` to use simple comparison by sayantn · Pull Request #2063 · rust-lang/stdarch

sayantn · 2026-03-16T00:10:47Z

Currently intrinsic-test prints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputs

It is much easier to review commit-by-commit

sayantn · 2026-03-16T01:23:31Z

---- test_vdupq_n_f16 stdout ----

thread 'test_vdupq_n_f16' (2187) panicked at mod_0/src/lib.rs:13773:17:
assertion `left == right` failed: 
  left: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0)]
 right: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(1.43e-5), NiceF16(0.0), NiceF16(-50430.0), NiceF16(1.79e-5)]

This seems weird (left is the Rust output, right is the C one, and NiceF16 is a wrapper which implements PartialEq as a == b || (a.is_nan() && b.is_nan())). This looks like ABI-related issue. For reference, the declaration looks like

unsafe extern "C" {
    fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}

In fact most f16 tests fail in armv7. @folkertdev can you help?

Edit:

To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for _mm_add_ps looks like void _mm_add_ps_wrapper(__m128 *dst, const __m128* a, const __m128* b). This fixed the AArch64 and ARMv7 problems, but now the AArch64BE tests are failing, because apparently C and Rust have different pointer load semantics for matrix-like vectors (e.g. uint64x2x2_t) https://godbolt.org/z/j1d16z1P9

@Amanieu is this intended behavior or a bug?

sayantn · 2026-03-16T08:46:41Z

Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev

folkertdev · 2026-03-16T19:40:36Z

Great work. Quick sanity check on f16, perhaps we're still using LLVM 21 to compile the C? If LLVM 21 is used (and also on windows apparently still with LLVM 22) then on some targets the ABI is inconsistent.

sayantn · 2026-03-16T20:06:47Z

@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org

folkertdev · 2026-03-16T20:10:12Z

I'm seeing clang-18 here even https://triage.rust-lang.org/gha-logs/rust-lang/stdarch/67182959392?pr=2063. I'm not sure what the best solution is really. You could ask T-infra if they have ideas.

sayantn · 2026-03-16T20:14:58Z

yeah, but I can use the LLVM github builds or the kernel.org builds

tgross35 · 2026-03-16T20:56:46Z

Can f16 tests just be gated with #[cfg(target_has_reliable_f16)]? That's likely easier than working around the Windows and old LLVM failures.

sayantn · 2026-03-16T20:58:30Z

@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. vzipq. The C version seems to return all zeros

edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag

folkertdev · 2026-03-16T21:20:26Z

With LLVM 22 f16 should work on armv7 though?

tgross35 · 2026-03-16T21:27:37Z

FTZ/DAZ-related perhaps?

sayantn · 2026-03-17T06:01:39Z

FTZ/DAZ-related perhaps?

I don't really think so, the outputs seem completely distinct.

I noticed that vzipq etc was failing so I tried out the assemblies.
In aarch64_be-unknown-linux-gnu,

use core::arch::aarch64::*;

#[unsafe(no_mangle)]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn foo(dst: *mut uint8x16x2_t, a: *const uint8x16_t, b: *const uint8x16_t) {
    unsafe {
        *dst = vzipq_u8(*a, *b);
    }
}

produces

foo:
        ld1 { v0.16b }, [x1]
        ld1 { v1.16b }, [x2]
        add x8, x0, #16
        zip1 v2.16b, v0.16b, v1.16b
        zip2 v0.16b, v0.16b, v1.16b
        st1 { v2.16b }, [x0]
        st1 { v0.16b }, [x8]
        ret

But the C code seemingly has different behavior on GCC and clang https://godbolt.org/z/T3YnrejjG

@adamgemmell can you help in this?

adamgemmell · 2026-03-18T14:55:11Z

I'm not sure it will fix your issue but the difference in instructions comes from the fact that in arm_neon.h, they reverse every vector before and after the operation on big endian. It's not always actually necessary so we only do it if it's broken without it - however, the intrinsic test tool doesn't detect the difference in behaviour because both arguments it picks are identical.

e.g.:

| a               | 15 | 14 | 13 | ... | 2  | 1  | 0  |
| b               | 31 | 30 | 29 | ... | 18 | 17 | 16 |
| a = rev(a)      | 0  | 1  | 2  | ... | 13 | 14 | 15 |
| b = rev(b)      | 16 | 17 | 18 | ... | 29 | 30 | 31 |
| ret = zip(a, b) | 0  | 16 | 1  | ... | 30 | 15 | 31 |
| rev(ret)        | 31 | 15 | 30 | ... | 1  | 16 | 0  |

adamgemmell · 2026-03-18T15:53:48Z

You can try adding big_endian_inverse: true to the definition in the yaml, regenerating to see if it changes anything. We can probably offset the pointers to the values array for arguments slightly to ensure they're different

Also I don't actually see vzipq_u8 on the latest CI run, why is that?

sayantn · 2026-03-18T16:25:37Z

Also I don't actually see vzipq_u8 on the latest CI run, why is that?

I have no idea, I can confirm that locally the test is generated and run.

You can try adding big_endian_inverse: true to the definition in the yaml, regenerating to see if it changes anything. We can probably offset the pointers to the values array for arguments slightly to ensure they're different

I will check. Thanks

Edit: @adamgemmell adding big_endian_inverse: true to the functions seem to work. The only question remains is that is it the correct behavior, or is clang buggy here

Edit2: sorry, vzipq_u8 is not getting tested even locally. I will look into it

adamgemmell · 2026-03-18T17:36:27Z

None of the unsigned variants of vzipq seem to be seen there, weird.

I'd quite like to know why this patch detects the difference - when I looked locally the codegen of the tests seemed very similar

sayantn · 2026-03-18T18:19:32Z

Yeah I fixed the test not being included, I used / instead of div_ceil when computing the chunk sizes, so the last module wasn't getting included 😅

rustbot · 2026-05-11T11:59:59Z

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: @Amanieu, @adamgemmell, @davidtwco, @folkertdev, @sayantn
@Amanieu, @adamgemmell, @davidtwco, @folkertdev, @sayantn expanded to Amanieu, adamgemmell, davidtwco, folkertdev, sayantn
Random selection from Amanieu, adamgemmell, davidtwco, folkertdev

sayantn · 2026-05-11T15:26:09Z

@Amanieu @folkertdev sorry for the inconvenience, I will split up just one more PR (last one I promise). Figured out how to split the vcmla changes

@rustbot author

rustbot · 2026-05-11T15:42:16Z

Error: The feature shortcut is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip.

…files wrappers

rustbot · 2026-05-12T01:26:39Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

folkertdev

Some questions/notes but overall this looks good and is very exciting!

View changes since this review

folkertdev · 2026-05-12T18:58:47Z

+    )*}
+}
+
+make_nice!(NiceF16(f16), NiceF32(f32), NiceF64(f64));


What is "nice" here? is it really just partialeq wrapper? Maybe there is a better name for that? Or just document that that is the goal.

yes, it is just wrapping PartialEq to deal with NaNs, I couldn't come up with a nice name so just named it Nice lol. Any suggestions for the name?

idk, CustomPartialEqF16 etc?

Decided to go with NanEqF32 and added a small comment

folkertdev · 2026-05-12T19:00:34Z

            .enumerate()
            .map(|(i, chunk)| {
-                let c_filename = format!("c_programs/wrapper_{i}.cpp");
+                let c_filename = format!("c_programs/wrapper_{i}.c");


oh are we just C now (no c++ any more)?

yea, C++ was used previously to take advantage of some templates, but C typically has much better compile times, and linking to C++ from Rust is weird af

yes, this is great!

folkertdev · 2026-05-13T14:08:31Z

 # top bits are undefined, unclear how to test these
+_mm256_castph128_ph256
+_mm256_castps128_ps256
+_mm256_castpd128_pd256


and we just didn't notice before?

Actually not sure how, maybe the C compiler constant-folding some extracts to 0 (it was UB so the compiler could do anything)

sayantn force-pushed the intrinsic-test branch 4 times, most recently from feb1dcd to 6ef8b8f Compare March 16, 2026 00:59

sayantn force-pushed the intrinsic-test branch 4 times, most recently from e2346ff to db1b2ca Compare March 16, 2026 06:16

sayantn force-pushed the intrinsic-test branch 2 times, most recently from ce53e81 to 76dd339 Compare March 16, 2026 20:48

sayantn force-pushed the intrinsic-test branch from 76dd339 to c4b138f Compare March 17, 2026 06:18

This comment has been minimized.

Sign in to view

sayantn force-pushed the intrinsic-test branch 2 times, most recently from a057d30 to 2dfa840 Compare April 24, 2026 20:40

sayantn force-pushed the intrinsic-test branch from de4895d to 71ef8b9 Compare May 10, 2026 02:11

This comment has been minimized.

Sign in to view

sayantn force-pushed the intrinsic-test branch 2 times, most recently from 19cfd95 to 421ca05 Compare May 11, 2026 11:58

sayantn marked this pull request as ready for review May 11, 2026 11:59

rustbot assigned Amanieu May 11, 2026

sayantn force-pushed the intrinsic-test branch from 421ca05 to 3f30528 Compare May 11, 2026 13:45

sayantn force-pushed the intrinsic-test branch 2 times, most recently from 8a85fab to 5a3c59a Compare May 11, 2026 16:17

Remove code for compiling and comparing C and Rust files, made the C …

bf11ce5

…files wrappers

sayantn force-pushed the intrinsic-test branch from 5a3c59a to abcaa7e Compare May 12, 2026 01:26

folkertdev approved these changes May 13, 2026

View reviewed changes

sayantn force-pushed the intrinsic-test branch from abcaa7e to 6400448 Compare May 13, 2026 18:38

sayantn added 8 commits May 14, 2026 09:54

Generate rust bindings and test code

bdb20fc

Use pointers for the C definitions to resolve ABI inconsistencies

6a931ad

Make floats static, as rounding is not a real concern here anymore

376d6bb

Modify the CI scripts to work with the new design

a6fe5e5

gen-arm: toggle big_endian_inverse where required

0701e9f

Disable some assert_instr tests in big-endian

4fc45fb

Fix _mm_sm3rnds2_epi32

c73c6ef

Disable some tests in x86 due to CI failures

c2845dc

sayantn force-pushed the intrinsic-test branch from 6400448 to c2845dc Compare May 14, 2026 04:25

Amanieu added this pull request to the merge queue May 17, 2026

Merged via the queue into rust-lang:main with commit bb24cbd May 17, 2026
74 checks passed

sayantn deleted the intrinsic-test branch May 27, 2026 13:56

Conversation

sayantn commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Mar 16, 2026

Uh oh!

folkertdev commented Mar 16, 2026

Uh oh!

sayantn commented Mar 16, 2026

Uh oh!

folkertdev commented Mar 16, 2026

Uh oh!

sayantn commented Mar 16, 2026

Uh oh!

tgross35 commented Mar 16, 2026

Uh oh!

sayantn commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev commented Mar 16, 2026

Uh oh!

tgross35 commented Mar 16, 2026

Uh oh!

sayantn commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamgemmell commented Mar 18, 2026

Uh oh!

adamgemmell commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayantn commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamgemmell commented Mar 18, 2026

Uh oh!

sayantn commented Mar 18, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rustbot commented May 11, 2026

Uh oh!

sayantn commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented May 11, 2026

Uh oh!

rustbot commented May 12, 2026

Uh oh!

folkertdev left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

folkertdev May 12, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn May 13, 2026

Choose a reason for hiding this comment

Uh oh!

folkertdev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn May 14, 2026

Choose a reason for hiding this comment

Uh oh!

folkertdev May 12, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

folkertdev May 13, 2026

Choose a reason for hiding this comment

Uh oh!

folkertdev May 13, 2026

Choose a reason for hiding this comment

sayantn commented Mar 16, 2026 •

edited

Loading

sayantn commented Mar 16, 2026 •

edited

Loading

sayantn commented Mar 16, 2026 •

edited

Loading

sayantn commented Mar 17, 2026 •

edited

Loading

adamgemmell commented Mar 18, 2026 •

edited

Loading

sayantn commented Mar 18, 2026 •

edited

Loading

sayantn commented May 11, 2026 •

edited

Loading

folkertdev left a comment •

edited by rustbot

Loading

sayantn May 13, 2026 •

edited

Loading

sayantn May 13, 2026 •

edited

Loading