fix avx10 detection under Intel SDE by abadams · Pull Request #9135 · halide/Halide

abadams · 2026-05-12T20:45:07Z

Claude says:

Host CPU detection (in both src/Target.cpp and the runtime helper
src/runtime/x86_cpu_features.cpp) was checking
CPUID.(EAX=7,ECX=0).EDX[19] to decide whether AVX10 is present. The
AVX10 enumeration bit actually lives at CPUID.(EAX=7,ECX=1).EDX[19]
-- both files had read both sub-leaves into info2 and info3
respectively, so the fix is a one-token change from info2.edx to
info3.edx in each place. The APX detection a few lines below was
already reading info3.edx, which is what tipped me off that the
AVX10 line was a slip rather than an intentional choice.

Why this hadn't surfaced on real silicon, despite being shipped since
early 2024: the bad path requires both os_avx512 to be true AND
CPUID.7.0.EDX[19] to be set on the same CPU. On Intel hardware
through Sapphire Rapids, CPUID.7.0.EDX[19] is reserved and reads as
zero, so the check is harmlessly always false. On hybrid Alder/Raptor
Lake parts various hybrid-related bits in CPUID.7.0.EDX are set,
but those CPUs don't have AVX-512 enabled by the OS, so os_avx512
is false and the outer guard blocks the path. On real Granite Rapids
(the first chip that actually has AVX10) both leaves agree, so the
buggy check happens to give the right answer.

Intel's SDE is what makes the bug visible: under sde -spr, the
emulated CPUID sets CPUID.7.0.EDX = 0xffdc4432 (bit 19 = 1) while
correctly leaving CPUID.7.1.EDX[19] clear, and reports os_avx512.
That combination -- impossible on shipped silicon -- triggers
host-detection to push Target::AVX10_1 even though the chip
doesn't have AVX10. SDE additionally returns
CPUID.24.0.EBX = 0x00000002 (version=2, but all of the 128/256/512
width bits clear), so vector_bits is never set away from its
default of 0, and CodeGen_X86::mattrs() hits
user_error << "AVX10 only supports 256 or 512 bit variants".

The runtime version had the same typo with a milder symptom: it
doesn't track vector_bits, so it would silently advertise
halide_target_feature_avx10_1 as available to anything that calls
halide_can_use_target_features on a machine matching the
SDE-style combination. No shipped Intel part hits this today, but
it's the same latent landmine and worth fixing alongside the JIT
path.

Found while running the correctness suite under SDE for a variety
of Intel microarchitectures, where every chip except -spr passed
cleanly; the 9/10 SPR failures all decoded to this single
detection bug.

Andrew says: I verified this is where LLVM looks for avx10 support in its host detection, so I believe Claude's explanation.

The x86 backend previously assumed AVX-512 microarchitectures form a strict feature hierarchy (Zen4 implies Cannonlake implies Skylake-AVX512 implies AVX512). It then drove `-mcpu` off the highest level in that chain, picking vendor-specific CPU names like `znver4` or `cannonlake`. That assumption isn't quite right: vendor-specific mcpu choices enable vendor-specific features for us (e.g. `-mcpu=znver4` turns on `sse4a`, which Cannonlake doesn't have). A Halide flag higher in the hierarchy was therefore not actually a strict superset of one below it -- code built for `AVX512_Zen4` could use SSE4a and fail to run on a Cannonlake CPU even though Halide treats Zen4 as "Cannonlake and above". Switch `mcpu_target()` to pick only generic `x86-64-v{N}` levels, and have `mattrs()` explicitly enable every feature Halide tracks on top of that baseline. The Halide feature flags keep their existing meaning, but a level like `AVX512_Zen4` now produces code that runs on the intersection of Zen4 and Cannonlake rather than the union, preserving the "each level is a superset of the level below" invariant the rest of the backend depends on. Per-CPU tuning via `mcpu_tune()` is untouched, so users who want znver4/znver5-specific scheduling still get it. Verified with `make correctness_simd_op_check_x86` (passes) and by running `correctness_vector_reductions` under Intel SDE's `-chip_check_die` for pnr/snb/hsw/skx/cnl/icx/spr with matching Halide targets (all pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Host CPU detection (in both `src/Target.cpp` and the runtime helper `src/runtime/x86_cpu_features.cpp`) was checking `CPUID.(EAX=7,ECX=0).EDX[19]` to decide whether AVX10 is present. The AVX10 enumeration bit actually lives at `CPUID.(EAX=7,ECX=1).EDX[19]` -- both files had read both sub-leaves into `info2` and `info3` respectively, so the fix is a one-token change from `info2.edx` to `info3.edx` in each place. The APX detection a few lines below was already reading `info3.edx`, which is what tipped me off that the AVX10 line was a slip rather than an intentional choice. Why this hadn't surfaced on real silicon, despite being shipped since early 2024: the bad path requires both `os_avx512` to be true AND `CPUID.7.0.EDX[19]` to be set on the same CPU. On Intel hardware through Sapphire Rapids, `CPUID.7.0.EDX[19]` is reserved and reads as zero, so the check is harmlessly always false. On hybrid Alder/Raptor Lake parts various hybrid-related bits in `CPUID.7.0.EDX` are set, but those CPUs don't have AVX-512 enabled by the OS, so `os_avx512` is false and the outer guard blocks the path. On real Granite Rapids (the first chip that actually has AVX10) both leaves agree, so the buggy check happens to give the right answer. Intel's SDE is what makes the bug visible: under `sde -spr`, the emulated CPUID sets `CPUID.7.0.EDX = 0xffdc4432` (bit 19 = 1) while correctly leaving `CPUID.7.1.EDX[19]` clear, and reports `os_avx512`. That combination -- impossible on shipped silicon -- triggers host-detection to push `Target::AVX10_1` even though the chip doesn't have AVX10. SDE additionally returns `CPUID.24.0.EBX = 0x00000002` (version=2, but all of the 128/256/512 width bits clear), so `vector_bits` is never set away from its default of 0, and `CodeGen_X86::mattrs()` hits `user_error << "AVX10 only supports 256 or 512 bit variants"`. The runtime version had the same typo with a milder symptom: it doesn't track `vector_bits`, so it would silently advertise `halide_target_feature_avx10_1` as available to anything that calls `halide_can_use_target_features` on a machine matching the SDE-style combination. No shipped Intel part hits this today, but it's the same latent landmine and worth fixing alongside the JIT path. Found while running the correctness suite under SDE for a variety of Intel microarchitectures, where every chip except `-spr` passed cleanly; the 9/10 SPR failures all decoded to this single detection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-12T22:02:38Z

Codecov Report

❌ Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd187a2). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/CodeGen_X86.cpp	90.62%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9135   +/-   ##
=======================================
  Coverage        ?   69.80%           
=======================================
  Files           ?      255           
  Lines           ?    77525           
  Branches        ?    18534           
=======================================
  Hits            ?    54120           
  Misses          ?    17948           
  Partials        ?     5457

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alexreinking · 2026-05-13T01:35:27Z

This needs to be rebased on main.

abadams and others added 2 commits May 12, 2026 12:07

alexreinking approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix avx10 detection under Intel SDE#9135

fix avx10 detection under Intel SDE#9135
abadams wants to merge 2 commits into
mainfrom
abadams/fix_avx10_detection

abadams commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Uh oh!

alexreinking commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abadams commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026

Codecov Report

Uh oh!

alexreinking commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants