[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size by wenju-he · Pull Request #22447 · intel/llvm

wenju-he · 2026-06-26T00:01:57Z

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp
CMPLRLLVM-76303

DGNode::UnscheduledPreds was added in a previous patch, so this patch makes use of it in the scheduler. Depending on Dir we can now schedule BottomUp or TopDown.

The default DWARF linker is parallel after #200971. Fix help message which still suggests classic DWARF linker.

HostInfoMacOSX's SharedCacheInfo used the dyld process-snapshot introspection SPIs only when <mach-o/dyld_introspection.h> was present, gating the calling code behind a compile-time macro. To avoid bifurcating the behavior based on the SDK, rather than the presence of the symbols, use dlsym to resolve them at runtime. While here, fold the duplicate dlsym of dyld_image_segment_data_ into the new, once-initialized, shared table. Assisted-by: Claude

…#189188) Add a new MachineFunctionPass (HexagonHVXSaveRemark) that emits optimization analysis remarks when HVX vector registers must be saved and restored around function calls. All HVX registers are caller-saved (Section 5.3 of the Hexagon ABI), so any HVX value live across a call requires a save/restore pair on the stack. Each HVX vector is 64 or 128 bytes, making this overhead expensive. The pass exits when remarks are not requested (-Rpass-analysis=hexagon-hvx-save) or when HVX is not enabled. A byte threshold (default 1024, tunable via -hexagon-hvx-save-threshold) filters out functions with only a small number of saves. The remarks help programmers identify call sites where inlining, hoisting, or sinking could reduce the save/restore cost.

To hopefully prevent the last failure mode that led to the job being disabled where the GitHub API failed to return results for >24 hours. Reviewers: cmtice Pull Request: llvm/llvm-project#205438

The Github API has recovered and the previous failure mode has been rectified by ensuring that branches are ready for deletion for seven days rather than 24 hours. Reviewers: cmtice Reviewed By: cmtice Pull Request: llvm/llvm-project#205439

A REDUCTION clause naming a user-defined operator (e.g., reduction(.myop.:x)) crashed in lowering: ReductionProcessor assumed the DefinedOperator clause variant always held an intrinsic operator and called std::get<IntrinsicOperator> unconditionally, which aborts for the DefinedOpName alternative. Handle DefinedOpName in the reduction clause processor, adding the clause-side counterpart to the directive handling from #190288. For a locally declared user-defined operator reduction, resolve the operator to its reduction symbol and reference the omp.declare_reduction op materialized for the declare reduction directive. The op name is now module-scoped via AbstractConverter::mangleName, on the directive and clause sides in lockstep, so reductions with the same operator spelling in different modules no longer collide. Cases that are not yet supported (reductions imported by USE association, renamed or merged operators, and declarations with multiple types) now emit a clean "not yet implemented" diagnostic instead of crashing or silently binding the wrong combiner. Support for the USE-associated and cross-module cases is a follow-up that builds on the semantic fix in #200329. Tests cover the issue's integer case and a derived-type case (both lower, with a module-scoped op name), plus the USE-associated and multiple-type cases (clean TODO). Fixes #204299 Assisted-by: Claude Opus 4.8, GPT-5.5.

LLVM CMake has the `LLVM_LIBDIR_SUFFIX` option to optionally add a suffix to the install library directory, so setting `-DLLVM_LIBDIR_SUFFIX=64` would result in a `lib64` library directory. We didn't honor the variable correctly in `libdevice`, `xpti`, `xptifw`, the driver, `sycl-jit` nor E2E testing. This option will be set by Linux distros packing the repo, so we need it to work. After this, further work is required to get Driver `lit` tests to pass with `LLVM_LIBDIR_SUFFIX` set, as the example device libraries we have are always in `lib`. Closes: intel#22355 Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>

Fixes #205073.

Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.

This patch was a part of llvm/llvm-project#201170. I split the `icmp ptr` support from the original PR since I am worried it might not catch up for the LLVM 23 release (#201170 is blocked by #200672 for curating mixed provenance tests). I hope we can pick most of the low-hanging fruit exposed by fuzzers before the release. The released version should be able to run csmith-generated tests without obvious false positives or crashes. BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e., truncating the address to the address width. The naming is a bit confusing...). Currently, we don't model external state in non-address bits of a pointer in llubi. So I think it is fine.

…l#22408) Fixes build breakage caused by 448b725 ("clang/Driver: Use struct type for BoundArch instead of StringRef"), which changed virtual signatures in `ToolChain.h` and related APIs. Update SYCL/CUDA driver code to use the new `BoundArch` struct type instead of `StringRef`/`const char*`: - SYCL.h/Cuda.h: update override signatures to match base class (`getDeviceLibs`, `getSupportedSanitizers`) - Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const char*` to `BoundArch`; fix all downstream uses including `appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and `BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value references to use `BA` parameter); fix `nullptr`/`StringRef()` in `DeviceDependences::add`, `HostDependence`, and unbundling action `registerDependentActionInfo` calls - Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef` contexts; fix `doOnEachDependence` lambda param type; fix `getArgsForToolChain` calls with empty arch Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6

This makes it more consistent with the rest of the repository.

Fix missing opcodes in table of flag-setting instructions.

Reverts llvm/llvm-project#173135 and and add two new IR tests to demonstrate the impact of different atomic orderings on Dead Store Elimination(DSE). This reverts commit c8941df. Co-authored-by: Aiden Grossman <aidengrossman@google.com>

1) Return the evaluated APValue as a const pointer since it may not be modified by callers. 2) Only return a non-nullptr from `getEvaluatedValue()` if the APValue not absent.

Local build on Linux platform reports a compiler warning: llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning: implicit conversion loses integer precision: 'long' to 'int' [-Wshorten-64-to-32] 546 | int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN); | ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. Signed-off-by: jinge90 <ge.jin@intel.com>

…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

…el#22409) Test fails after 96eb0cb. Add NativeCPULibclcCall, SYCLGlobalVar, SYCLIntelESimdVectorize, SYCLUsesAspects to undocumented list; remove ReqdWorkGroupSize and WorkGroupSizeHint (now documented); update total 84->86. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

This fixes an oversight in #164241.

…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.

Use m_c_ICmp so the load can be on either side of the icmp.

Without asserts, we see failures like so: /repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error: unused variable 'NextI' [-Werror,-Wunused-variable] 982 | MachineBasicBlock::const_instr_iterator NextI = std::next(MI.getIterator()); | ^~~~~ 1 error generated. Mark NextI `maybe_unused` to address the issue. Fixes a regression introduced by f8aa5f6.

Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.

This PR fixes two related DWARF constant-handling bugs that were blocking each other. First, LLDB's DWARF expression evaluator in [`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp) handled `DW_OP_constu` and `DW_OP_consts` without going through `to_generic`. Under DWARF, these operators push a generic value: an address-sized integral value with unspecified signedness. That means the result should be truncated to the target address size (via `to_generic`). Second, LLVM already had a producer-side issue tracked as [#47431](llvm/llvm-project#47431): on 32-bit targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source integer constants wider than the target generic type. If LLDB were fixed alone, those producer-emitted constants would become truncated as DWARF requires, exposing incorrect debug info for wide source values. This patch fixes both sides together. ## What Changed On the LLDB consumer side: - `DW_OP_constu` now uses `to_generic`. - `DW_OP_consts` now uses `to_generic`. - The corresponding LLDB DWARF expression tests were updated to expect address-sized generic values. On the LLVM producer side: - Wide integer debug-location constants that cannot be represented by the target generic type are emitted as `DW_OP_implicit_value` instead of `DW_OP_const*`. - This preserves the source value bytes instead of relying on an address-sized DWARF generic constant. - The producer-side change is limited to complete constant values, where there are no remaining `DIExpression` operations. ## Validation Locally verified with: ```text build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*' 74 tests passed build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll 1 test passed ninja -C build check-lldb -j12 No unexpected failures ninja -C build check-all -j12 Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed. ```

…hs (#205492) `performFusion()` and `fuseGuardedLoops()` carried two character-for-character identical tails: header-PHI migration plus latch rewiring, and the SCEV-forget / block-merge / latch-merge finalization. Extract them into `rewireFusedHeaderPHIsAndLatches()` and `finalizeFusedLoop()` and call both from each path.

…205498)

It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.

…on (#202121) This patch mainly fixes a bug with parsing of unknown doxygen commands in function parameter documentation. To extract the parameter documentation from the function documentation, the whole function documentation is parsed first. Then the documentation paragraph for the requested parameter is "converted" to a string and stored as the documentation for the parameter. The string is converted by visiting and dumping all chunks of the parsed paragraph. When unknown doxygen commands are parsed (during the function documentation parsing step), they are registered in a `clang::comments::CommandTraits` object. Visiting the unknown command requires to query the registered commands through the `clang::comments::CommandTraits` object to get the command name. The bug was that the function documentation parsing and the visiting step used 2 different `clang::comments::CommandTraits` objects. Hence the visiting step fails (array access out of bounds) when trying to retrieve the command names for unknown commands. The patch moves the function documentation parsing step to the construction of the `SymbolDocCommentVisitor` which is also responsible for converting the parameter documentation paragraph to a string. This way the same `clang::comments::CommandTraits` is used and the query for unknown command names is correct. Additional fixes: - correct some whitespace behaviour for doxygen inline commands - add a new token kind for the clang comment parser to distinguish unknown "backslash" and "at" commands to correctly show them in the clangd hover info Related issue: clangd/clangd#2671

This adds a `noipa` function attribute to LLVM IR. This new attribute disables any interprocedural analysis that inspects the definition of the function. Setting this attribute is equivalent to moving the function definition to a separate, optimizer-opaque, module. The `noipa` attribute does *not* control inlining or outlining. Add the `noinline` and `nooutline` attributes as well in cases where inlining and outlining should additionally be disabled. Revival of https://reviews.llvm.org/D101011 Discussed in https://discourse.llvm.org/t/noipa-continues/74411 LLVM portion of llvm/llvm-project#40819

This PR fixes warnings about unused variables.

…reeze without SPV_KHR_poison_freeze' (intel#22428) Original commit KhronosGroup/SPIRV-LLVM-Translator@a2a2774 This fixes a bug exposed by Pytorch, and I'm trying to add Pytorch testing to our CI here. Closes: intel#22308 Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

We use 20.1.8 while the clang-format we obtained from building intel/llvm is 23.0.0. Due to this, clang-format run locally produces different formatting then the one proposed by CI workflow.

PR intel#22357 pointed out that sycl-pch-include.cpp was misspelled and that sycl-pch-use.cpp had redundant comments.

CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp

…up_size AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets. Fixes CodeGenSYCL/reqd-work-group-size.cpp CMPLRLLVM-76303 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vporpo and others added 30 commits June 23, 2026 15:53

[SandboxVec][Scheduler] Implement direction (#205193)

43b63b6

DGNode::UnscheduledPreds was added in a previous patch, so this patch makes use of it in the scheduler. Depending on Dir we can now schedule BottomUp or TopDown.

[dsymutil] Fix help message after #200971 (#203337)

bab165e

The default DWARF linker is parallel after #200971. Fix help message which still suggests classic DWARF linker.

[Github] Make prune-unused-branches only delete branches after 7 days

bdc00c2

To hopefully prevent the last failure mode that led to the job being disabled where the GitHub API failed to return results for >24 hours. Reviewers: cmtice Pull Request: llvm/llvm-project#205438

[SYCL][NFC] Move target triple name out of loop in KernelCompiler Dev…

45c9268

…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>

[mlir][gpu] Fix memref.dim folding with negative index (#205338)

4bb31d7

Fixes #205073.

[mlir] Simplify DimOp::fold by using getConstantIndex(NFC) (#205343)

719144a

Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.

[clang] Exclude EmptyRecord when calculating larger CXX records (#205…

ca36859

…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6

[Github] Bump release-binaries python version (#179287)

f41a6b7

This makes it more consistent with the rest of the repository.

[AArch64] Add final missing instructions to sForm (#167518)

be3ee6f

Fix missing opcodes in table of flag-setting instructions.

[clang][AST] Refactor EvaluatedStmt accessors in VarDecl (#205033)

9ee7bda

1) Return the evaluated APValue as a const pointer since it may not be modified by callers. 2) Only return a non-nullptr from `getEvaluatedValue()` if the APValue not absent.

[SYCL] Use ext_vector_type for optimizing marray arithmetic (intel#…

d70125b

…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

[AMDGPU] Reject src1 immediates with dpp when unsupported (#201494)

7570d2d

This fixes an oversight in #164241.

[clang-tidy][NFC] Remove a wrong comment in ProTypeMemberInitCheck (#…

51b0fc4

…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.

[LV] Accept swapped operands in early-exit condition compare (#199989)

81a8c66

Use m_c_ICmp so the load can be on either side of the icmp.

[llubi] Implement memory manipulation intrinsics (#204932)

a12ce96

Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.

[RISCV] Convert opaque pointers in vp-combine-reverse-load.ll. NFC (#…

82c5bce

…205498)

[AArch64] Run cleanup one final time after peephole (#199711)

448c3d5

It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.

tcottin and others added 9 commits June 25, 2026 14:21

[SYCL] Fix warnings about unused variables (intel#22414)

4765572

This PR fixes warnings about unused variables.

[CI] Use latest clang-format in pr-code-format.yml (intel#22407)

737cc8b

We use 20.1.8 while the clang-format we obtained from building intel/llvm is 23.0.0. Due to this, clang-format run locally produces different formatting then the one proposed by CI workflow.

[SYCL][NFC] Some minor test cleanups (intel#22389)

8c0c625

PR intel#22357 pointed out that sycl-pch-include.cpp was misspelled and that sycl-pch-use.cpp had redundant comments.

Merge from 'sycl' to 'sycl-web' (4 commits)

dccbead

Merge from 'main' to 'sycl-web' (43 commits)

8c373f2

CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp

wenju-he requested a review from a team as a code owner June 26, 2026 00:01

wenju-he force-pushed the sycl-web branch from 89c9dff to 0c0065b Compare June 26, 2026 06:34

wenju-he requested review from a team, Maetveis, bader and cperkinsintel as code owners June 26, 2026 06:34

wenju-he requested review from kweronsx and mmichel11 and removed request for a team June 26, 2026 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447
wenju-he wants to merge 520 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp

wenju-he commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

wenju-he commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants