fix avx10 detection under Intel SDE#9135
Open
abadams wants to merge 2 commits into
Open
Conversation
The x86 backend previously assumed AVX-512 microarchitectures form a
strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512
implies AVX512). It then drove `-mcpu` off the highest level in that
chain, picking vendor-specific CPU names like `znver4` or `cannonlake`.
That assumption isn't quite right: vendor-specific mcpu choices enable
vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`,
which Cannonlake doesn't have). A Halide flag higher in the hierarchy
was therefore not actually a strict superset of one below it -- code
built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake
CPU even though Halide treats Zen4 as "Cannonlake and above".
Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and
have `mattrs()` explicitly enable every feature Halide tracks on top of
that baseline. The Halide feature flags keep their existing meaning,
but a level like `AVX512_Zen4` now produces code that runs on the
intersection of Zen4 and Cannonlake rather than the union, preserving
the "each level is a superset of the level below" invariant the rest of
the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched,
so users who want znver4/znver5-specific scheduling still get it.
Verified with `make correctness_simd_op_check_x86` (passes) and by
running `correctness_vector_reductions` under Intel SDE's
`-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide
targets (all pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host CPU detection (in both `src/Target.cpp` and the runtime helper `src/runtime/x86_cpu_features.cpp`) was checking `CPUID.(EAX=7,ECX=0).EDX[19]` to decide whether AVX10 is present. The AVX10 enumeration bit actually lives at `CPUID.(EAX=7,ECX=1).EDX[19]` -- both files had read both sub-leaves into `info2` and `info3` respectively, so the fix is a one-token change from `info2.edx` to `info3.edx` in each place. The APX detection a few lines below was already reading `info3.edx`, which is what tipped me off that the AVX10 line was a slip rather than an intentional choice. Why this hadn't surfaced on real silicon, despite being shipped since early 2024: the bad path requires both `os_avx512` to be true AND `CPUID.7.0.EDX[19]` to be set on the same CPU. On Intel hardware through Sapphire Rapids, `CPUID.7.0.EDX[19]` is reserved and reads as zero, so the check is harmlessly always false. On hybrid Alder/Raptor Lake parts various hybrid-related bits in `CPUID.7.0.EDX` are set, but those CPUs don't have AVX-512 enabled by the OS, so `os_avx512` is false and the outer guard blocks the path. On real Granite Rapids (the first chip that actually has AVX10) both leaves agree, so the buggy check happens to give the right answer. Intel's SDE is what makes the bug visible: under `sde -spr`, the emulated CPUID sets `CPUID.7.0.EDX = 0xffdc4432` (bit 19 = 1) while correctly leaving `CPUID.7.1.EDX[19]` clear, and reports `os_avx512`. That combination -- impossible on shipped silicon -- triggers host-detection to push `Target::AVX10_1` even though the chip doesn't have AVX10. SDE additionally returns `CPUID.24.0.EBX = 0x00000002` (version=2, but all of the 128/256/512 width bits clear), so `vector_bits` is never set away from its default of 0, and `CodeGen_X86::mattrs()` hits `user_error << "AVX10 only supports 256 or 512 bit variants"`. The runtime version had the same typo with a milder symptom: it doesn't track `vector_bits`, so it would silently advertise `halide_target_feature_avx10_1` as available to anything that calls `halide_can_use_target_features` on a machine matching the SDE-style combination. No shipped Intel part hits this today, but it's the same latent landmine and worth fixing alongside the JIT path. Found while running the correctness suite under SDE for a variety of Intel microarchitectures, where every chip except `-spr` passed cleanly; the 9/10 SPR failures all decoded to this single detection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9135 +/- ##
=======================================
Coverage ? 69.80%
=======================================
Files ? 255
Lines ? 77525
Branches ? 18534
=======================================
Hits ? 54120
Misses ? 17948
Partials ? 5457 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
|
This needs to be rebased on main. |
alexreinking
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Claude says:
Host CPU detection (in both
src/Target.cppand the runtime helpersrc/runtime/x86_cpu_features.cpp) was checkingCPUID.(EAX=7,ECX=0).EDX[19]to decide whether AVX10 is present. TheAVX10 enumeration bit actually lives at
CPUID.(EAX=7,ECX=1).EDX[19]-- both files had read both sub-leaves into
info2andinfo3respectively, so the fix is a one-token change from
info2.edxtoinfo3.edxin each place. The APX detection a few lines below wasalready reading
info3.edx, which is what tipped me off that theAVX10 line was a slip rather than an intentional choice.
Why this hadn't surfaced on real silicon, despite being shipped since
early 2024: the bad path requires both
os_avx512to be true ANDCPUID.7.0.EDX[19]to be set on the same CPU. On Intel hardwarethrough Sapphire Rapids,
CPUID.7.0.EDX[19]is reserved and reads aszero, so the check is harmlessly always false. On hybrid Alder/Raptor
Lake parts various hybrid-related bits in
CPUID.7.0.EDXare set,but those CPUs don't have AVX-512 enabled by the OS, so
os_avx512is false and the outer guard blocks the path. On real Granite Rapids
(the first chip that actually has AVX10) both leaves agree, so the
buggy check happens to give the right answer.
Intel's SDE is what makes the bug visible: under
sde -spr, theemulated CPUID sets
CPUID.7.0.EDX = 0xffdc4432(bit 19 = 1) whilecorrectly leaving
CPUID.7.1.EDX[19]clear, and reportsos_avx512.That combination -- impossible on shipped silicon -- triggers
host-detection to push
Target::AVX10_1even though the chipdoesn't have AVX10. SDE additionally returns
CPUID.24.0.EBX = 0x00000002(version=2, but all of the 128/256/512width bits clear), so
vector_bitsis never set away from itsdefault of 0, and
CodeGen_X86::mattrs()hitsuser_error << "AVX10 only supports 256 or 512 bit variants".The runtime version had the same typo with a milder symptom: it
doesn't track
vector_bits, so it would silently advertisehalide_target_feature_avx10_1as available to anything that callshalide_can_use_target_featureson a machine matching theSDE-style combination. No shipped Intel part hits this today, but
it's the same latent landmine and worth fixing alongside the JIT
path.
Found while running the correctness suite under SDE for a variety
of Intel microarchitectures, where every chip except
-sprpassedcleanly; the 9/10 SPR failures all decoded to this single
detection bug.
Andrew says: I verified this is where LLVM looks for avx10 support in its host detection, so I believe Claude's explanation.