Redesign intrinsic-test to use simple comparison#2063
Redesign intrinsic-test to use simple comparison#2063sayantn wants to merge 5 commits intorust-lang:mainfrom
intrinsic-test to use simple comparison#2063Conversation
fc52b8d to
feb1dcd
Compare
This seems weird ( unsafe extern "C" {
fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}In fact most Edit: To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for @Amanieu is this intended behavior or a bug? |
e2346ff to
db1b2ca
Compare
|
Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev |
|
Great work. Quick sanity check on |
|
@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org |
|
I'm seeing |
|
yeah, but I can use the LLVM github builds or the kernel.org builds |
ce53e81 to
76dd339
Compare
|
Can f16 tests just be gated with |
|
@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag |
|
With LLVM 22 |
|
FTZ/DAZ-related perhaps? |
I don't really think so, the outputs seem completely distinct. I noticed that use core::arch::aarch64::*;
#[unsafe(no_mangle)]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn foo(dst: *mut uint8x16x2_t, a: *const uint8x16_t, b: *const uint8x16_t) {
unsafe {
*dst = vzipq_u8(*a, *b);
}
}produces foo:
ld1 { v0.16b }, [x1]
ld1 { v1.16b }, [x2]
add x8, x0, #16
zip1 v2.16b, v0.16b, v1.16b
zip2 v0.16b, v0.16b, v1.16b
st1 { v2.16b }, [x0]
st1 { v0.16b }, [x8]
retBut the C code seemingly has different behavior on GCC and clang https://godbolt.org/z/T3YnrejjG @adamgemmell can you help in this? |
|
I'm not sure it will fix your issue but the difference in instructions comes from the fact that in arm_neon.h, they reverse every vector before and after the operation on big endian. It's not always actually necessary so we only do it if it's broken without it - however, the intrinsic test tool doesn't detect the difference in behaviour because both arguments it picks are identical. e.g.: |
|
You can try adding Also I don't actually see vzipq_u8 on the latest CI run, why is that? |
I have no idea, I can confirm that locally the test is generated and run.
I will check. Thanks Edit: @adamgemmell adding Edit2: sorry, |
|
None of the unsigned variants of vzipq seem to be seen there, weird. I'd quite like to know why this patch detects the difference - when I looked locally the codegen of the tests seemed very similar |
|
Yeah I fixed the test not being included, I used |
Currently
intrinsic-testprints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputs