Skip to content

fix avx10 detection under Intel SDE#9135

Open
abadams wants to merge 2 commits into
mainfrom
abadams/fix_avx10_detection
Open

fix avx10 detection under Intel SDE#9135
abadams wants to merge 2 commits into
mainfrom
abadams/fix_avx10_detection

Conversation

@abadams
Copy link
Copy Markdown
Member

@abadams abadams commented May 12, 2026

Claude says:

Host CPU detection (in both src/Target.cpp and the runtime helper
src/runtime/x86_cpu_features.cpp) was checking
CPUID.(EAX=7,ECX=0).EDX[19] to decide whether AVX10 is present. The
AVX10 enumeration bit actually lives at CPUID.(EAX=7,ECX=1).EDX[19]
-- both files had read both sub-leaves into info2 and info3
respectively, so the fix is a one-token change from info2.edx to
info3.edx in each place. The APX detection a few lines below was
already reading info3.edx, which is what tipped me off that the
AVX10 line was a slip rather than an intentional choice.

Why this hadn't surfaced on real silicon, despite being shipped since
early 2024: the bad path requires both os_avx512 to be true AND
CPUID.7.0.EDX[19] to be set on the same CPU. On Intel hardware
through Sapphire Rapids, CPUID.7.0.EDX[19] is reserved and reads as
zero, so the check is harmlessly always false. On hybrid Alder/Raptor
Lake parts various hybrid-related bits in CPUID.7.0.EDX are set,
but those CPUs don't have AVX-512 enabled by the OS, so os_avx512
is false and the outer guard blocks the path. On real Granite Rapids
(the first chip that actually has AVX10) both leaves agree, so the
buggy check happens to give the right answer.

Intel's SDE is what makes the bug visible: under sde -spr, the
emulated CPUID sets CPUID.7.0.EDX = 0xffdc4432 (bit 19 = 1) while
correctly leaving CPUID.7.1.EDX[19] clear, and reports os_avx512.
That combination -- impossible on shipped silicon -- triggers
host-detection to push Target::AVX10_1 even though the chip
doesn't have AVX10. SDE additionally returns
CPUID.24.0.EBX = 0x00000002 (version=2, but all of the 128/256/512
width bits clear), so vector_bits is never set away from its
default of 0, and CodeGen_X86::mattrs() hits
user_error << "AVX10 only supports 256 or 512 bit variants".

The runtime version had the same typo with a milder symptom: it
doesn't track vector_bits, so it would silently advertise
halide_target_feature_avx10_1 as available to anything that calls
halide_can_use_target_features on a machine matching the
SDE-style combination. No shipped Intel part hits this today, but
it's the same latent landmine and worth fixing alongside the JIT
path.

Found while running the correctness suite under SDE for a variety
of Intel microarchitectures, where every chip except -spr passed
cleanly; the 9/10 SPR failures all decoded to this single
detection bug.

Andrew says: I verified this is where LLVM looks for avx10 support in its host detection, so I believe Claude's explanation.

abadams and others added 2 commits May 12, 2026 12:07
The x86 backend previously assumed AVX-512 microarchitectures form a
strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512
implies AVX512). It then drove `-mcpu` off the highest level in that
chain, picking vendor-specific CPU names like `znver4` or `cannonlake`.

That assumption isn't quite right: vendor-specific mcpu choices enable
vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`,
which Cannonlake doesn't have). A Halide flag higher in the hierarchy
was therefore not actually a strict superset of one below it -- code
built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake
CPU even though Halide treats Zen4 as "Cannonlake and above".

Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and
have `mattrs()` explicitly enable every feature Halide tracks on top of
that baseline. The Halide feature flags keep their existing meaning,
but a level like `AVX512_Zen4` now produces code that runs on the
intersection of Zen4 and Cannonlake rather than the union, preserving
the "each level is a superset of the level below" invariant the rest of
the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched,
so users who want znver4/znver5-specific scheduling still get it.

Verified with `make correctness_simd_op_check_x86` (passes) and by
running `correctness_vector_reductions` under Intel SDE's
`-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide
targets (all pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host CPU detection (in both `src/Target.cpp` and the runtime helper
`src/runtime/x86_cpu_features.cpp`) was checking
`CPUID.(EAX=7,ECX=0).EDX[19]` to decide whether AVX10 is present. The
AVX10 enumeration bit actually lives at `CPUID.(EAX=7,ECX=1).EDX[19]`
-- both files had read both sub-leaves into `info2` and `info3`
respectively, so the fix is a one-token change from `info2.edx` to
`info3.edx` in each place. The APX detection a few lines below was
already reading `info3.edx`, which is what tipped me off that the
AVX10 line was a slip rather than an intentional choice.

Why this hadn't surfaced on real silicon, despite being shipped since
early 2024: the bad path requires both `os_avx512` to be true AND
`CPUID.7.0.EDX[19]` to be set on the same CPU. On Intel hardware
through Sapphire Rapids, `CPUID.7.0.EDX[19]` is reserved and reads as
zero, so the check is harmlessly always false. On hybrid Alder/Raptor
Lake parts various hybrid-related bits in `CPUID.7.0.EDX` are set,
but those CPUs don't have AVX-512 enabled by the OS, so `os_avx512`
is false and the outer guard blocks the path. On real Granite Rapids
(the first chip that actually has AVX10) both leaves agree, so the
buggy check happens to give the right answer.

Intel's SDE is what makes the bug visible: under `sde -spr`, the
emulated CPUID sets `CPUID.7.0.EDX = 0xffdc4432` (bit 19 = 1) while
correctly leaving `CPUID.7.1.EDX[19]` clear, and reports `os_avx512`.
That combination -- impossible on shipped silicon -- triggers
host-detection to push `Target::AVX10_1` even though the chip
doesn't have AVX10. SDE additionally returns
`CPUID.24.0.EBX = 0x00000002` (version=2, but all of the 128/256/512
width bits clear), so `vector_bits` is never set away from its
default of 0, and `CodeGen_X86::mattrs()` hits
`user_error << "AVX10 only supports 256 or 512 bit variants"`.

The runtime version had the same typo with a milder symptom: it
doesn't track `vector_bits`, so it would silently advertise
`halide_target_feature_avx10_1` as available to anything that calls
`halide_can_use_target_features` on a machine matching the
SDE-style combination. No shipped Intel part hits this today, but
it's the same latent landmine and worth fixing alongside the JIT
path.

Found while running the correctness suite under SDE for a variety
of Intel microarchitectures, where every chip except `-spr` passed
cleanly; the 9/10 SPR failures all decoded to this single
detection bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd187a2). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/CodeGen_X86.cpp 90.62% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9135   +/-   ##
=======================================
  Coverage        ?   69.80%           
=======================================
  Files           ?      255           
  Lines           ?    77525           
  Branches        ?    18534           
=======================================
  Hits            ?    54120           
  Misses          ?    17948           
  Partials        ?     5457           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@alexreinking
Copy link
Copy Markdown
Member

This needs to be rebased on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants