Skip to content

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447

Open
wenju-he wants to merge 520 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp
Open

[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447
wenju-he wants to merge 520 commits into
intel:sycl-webfrom
wenju-he:fix-CodeGenSYCL-reqd-work-group-size.cpp

Conversation

@wenju-he

Copy link
Copy Markdown
Contributor

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp
CMPLRLLVM-76303

vporpo and others added 30 commits June 23, 2026 15:53
DGNode::UnscheduledPreds was added in a previous patch, so this patch
makes use of it in the scheduler. Depending on Dir we can now schedule
BottomUp or TopDown.
The default DWARF linker is parallel after #200971. Fix help message
which still suggests classic DWARF linker.
HostInfoMacOSX's SharedCacheInfo used the dyld process-snapshot
introspection SPIs only when <mach-o/dyld_introspection.h> was present,
gating the calling code behind a compile-time macro.

To avoid bifurcating the behavior based on the SDK, rather than the
presence of the symbols, use dlsym to resolve them at runtime.

While here, fold the duplicate dlsym of dyld_image_segment_data_ into
the new, once-initialized, shared table.

Assisted-by: Claude
…#189188)

Add a new MachineFunctionPass (HexagonHVXSaveRemark) that emits
optimization analysis remarks when HVX vector registers must be saved
and restored around function calls. All HVX registers are caller-saved
(Section 5.3 of the Hexagon ABI), so any HVX value live across a call
requires a save/restore pair on the stack. Each HVX vector is 64 or 128
bytes, making this overhead expensive.

The pass exits when remarks are not requested
(-Rpass-analysis=hexagon-hvx-save) or when HVX is not enabled. A byte
threshold (default 1024, tunable via -hexagon-hvx-save-threshold)
filters out functions with only a small number of saves. The remarks
help programmers identify call sites where inlining, hoisting, or
sinking could reduce the save/restore cost.
To hopefully prevent the last failure mode that led to the job being
disabled where the GitHub API failed to return results for >24 hours.

Reviewers: cmtice

Pull Request: llvm/llvm-project#205438
The Github API has recovered and the previous failure mode has been
rectified by ensuring that branches are ready for deletion for seven
days rather than 24 hours.

Reviewers: cmtice

Reviewed By: cmtice

Pull Request: llvm/llvm-project#205439
A REDUCTION clause naming a user-defined operator (e.g.,
reduction(.myop.:x)) crashed in lowering: ReductionProcessor assumed the
DefinedOperator clause variant always held an intrinsic operator and called
std::get<IntrinsicOperator> unconditionally, which aborts for the
DefinedOpName alternative.

Handle DefinedOpName in the reduction clause processor, adding the
clause-side counterpart to the directive handling from #190288. For a
locally declared user-defined operator reduction, resolve the operator to
its reduction symbol and reference the omp.declare_reduction op materialized
for the declare reduction directive. The op name is now module-scoped via
AbstractConverter::mangleName, on the directive and clause sides in
lockstep, so reductions with the same operator spelling in different modules
no longer collide.

Cases that are not yet supported (reductions imported by USE association,
renamed or merged operators, and declarations with multiple types) now emit
a clean "not yet implemented" diagnostic instead of crashing or silently
binding the wrong combiner. Support for the USE-associated and cross-module
cases is a follow-up that builds on the semantic fix in #200329.

Tests cover the issue's integer case and a derived-type case (both lower,
with a module-scoped op name), plus the USE-associated and multiple-type
cases (clean TODO).

Fixes #204299

Assisted-by: Claude Opus 4.8, GPT-5.5.
LLVM CMake has the `LLVM_LIBDIR_SUFFIX` option to optionally add a
suffix to the install library directory, so setting
`-DLLVM_LIBDIR_SUFFIX=64` would result in a `lib64` library directory.

We didn't honor the variable correctly in `libdevice`, `xpti`, `xptifw`,
the driver, `sycl-jit` nor E2E testing.

This option will be set by Linux distros packing the repo, so we need it
to work.

After this, further work is required to get Driver `lit` tests to pass
with `LLVM_LIBDIR_SUFFIX` set, as the example device libraries we have
are always in `lib`.

Closes: intel#22355

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
…iceCompilation.cpp (intel#22344)

Signed-off-by: jinge90 <ge.jin@intel.com>
Refactor `DimOp::fold` in both memref and tensor dialects to use the
existing `getConstantIndex()` helper instead of manually extracting the
index via `IntegerAttr`.
This patch was a part of
llvm/llvm-project#201170. I split the `icmp ptr`
support from the original PR since I am worried it might not catch up
for the LLVM 23 release (#201170 is blocked by #200672 for curating
mixed provenance tests). I hope we can pick most of the low-hanging
fruit exposed by fuzzers before the release. The released version should
be able to run csmith-generated tests without obvious false positives or
crashes.

BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e.,
truncating the address to the address width. The naming is a bit
confusing...). Currently, we don't model external state in non-address
bits of a pointer in llubi. So I think it is fine.
…l#22408)

Fixes build breakage caused by 448b725
("clang/Driver: Use struct type for BoundArch instead of StringRef"),
which changed virtual signatures in `ToolChain.h` and related APIs.

Update SYCL/CUDA driver code to use the new `BoundArch` struct type
instead of `StringRef`/`const char*`:
- SYCL.h/Cuda.h: update override signatures to match base class
(`getDeviceLibs`, `getSupportedSanitizers`)
- Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const
char*` to `BoundArch`; fix all downstream uses including
`appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and
`BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value
references to use `BA` parameter); fix `nullptr`/`StringRef()` in
`DeviceDependences::add`, `HostDependence`, and unbundling action
`registerDependentActionInfo` calls
- Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef`
contexts; fix `doOnEachDependence` lambda param type; fix
`getArgsForToolChain` calls with empty arch

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…040)

To match with GCC: https://godbolt.org/z/KPKGhhenK

Fixes: #203760

Assisted-by: Claude Sonnet 4.6
This makes it more consistent with the rest of the repository.
Fix missing opcodes in table of flag-setting instructions.
Reverts llvm/llvm-project#173135 and and add two
new IR tests to demonstrate the impact of different atomic orderings on
Dead Store Elimination(DSE).

This reverts commit c8941df.

Co-authored-by: Aiden Grossman <aidengrossman@google.com>
1) Return the evaluated APValue as a const pointer since it
    may not be modified by callers.
 2) Only return a non-nullptr from `getEvaluatedValue()` if
    the APValue not absent.
Local build on Linux platform reports a compiler warning:
llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning:
implicit conversion loses integer precision: 'long' to 'int'
[-Wshorten-64-to-32]
  546 |     int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN);
      |         ~~~   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

Signed-off-by: jinge90 <ge.jin@intel.com>
…22342)

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
…el#22409)

Test fails after 96eb0cb.
Add NativeCPULibclcCall, SYCLGlobalVar, SYCLIntelESimdVectorize,
SYCLUsesAspects to undocumented list; remove ReqdWorkGroupSize and
WorkGroupSizeHint (now documented); update total 84->86.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…205477)

`getAsCXXRecordDecl` will return nullptr for any dependent types.

It's introduced by #192786, see
llvm/llvm-project#192786 (comment)
in original PR.
Use m_c_ICmp so the load can be on either side of the icmp.
Without asserts, we see failures like so:

/repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error:
unused variable 'NextI' [-Werror,-Wunused-variable]
982 | MachineBasicBlock::const_instr_iterator NextI =
std::next(MI.getIterator());
          |                                           ^~~~~
    1 error generated.

Mark NextI `maybe_unused` to address the issue.

Fixes a regression introduced by f8aa5f6.
Implement `memset`, `memcpy`, `memmove` intrinsics and their
corresponding inline version. Note that the `isvolatile` argument is
ignored and left for future PRs.
This PR fixes two related DWARF constant-handling bugs that were
blocking each other.

First, LLDB's DWARF expression evaluator in
[`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp)
handled `DW_OP_constu` and `DW_OP_consts` without going through
`to_generic`. Under DWARF, these operators push a generic value: an
address-sized integral value with unspecified signedness. That means the
result should be truncated to the target address size (via
`to_generic`).

Second, LLVM already had a producer-side issue tracked as
[#47431](llvm/llvm-project#47431): on 32-bit
targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source
integer constants wider than the target generic type. If LLDB were fixed
alone, those producer-emitted constants would become truncated as DWARF
requires, exposing incorrect debug info for wide source values.

This patch fixes both sides together.

## What Changed

On the LLDB consumer side:

- `DW_OP_constu` now uses `to_generic`.
- `DW_OP_consts` now uses `to_generic`.
- The corresponding LLDB DWARF expression tests were updated to expect
address-sized generic values.

On the LLVM producer side:

- Wide integer debug-location constants that cannot be represented by
the target generic type are emitted as `DW_OP_implicit_value` instead of
`DW_OP_const*`.
- This preserves the source value bytes instead of relying on an
address-sized DWARF generic constant.
- The producer-side change is limited to complete constant values, where
there are no remaining `DIExpression` operations.

## Validation

Locally verified with:

```text
build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*'
74 tests passed

build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll
1 test passed

ninja -C build check-lldb -j12
No unexpected failures

ninja -C build check-all -j12
Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed.

```
…hs (#205492)

`performFusion()` and `fuseGuardedLoops()` carried two
character-for-character identical tails: header-PHI migration plus latch
rewiring, and the SCEV-forget / block-merge / latch-merge finalization.
Extract them into `rewireFusedHeaderPHIsAndLatches()` and
`finalizeFusedLoop()` and call both from each path.
It's a lightweight pass. Should always be the last SSA pass since
peephole can end up making some instructions dead.
tcottin and others added 9 commits June 25, 2026 14:21
…on (#202121)

This patch mainly fixes a bug with parsing of unknown doxygen commands
in function parameter documentation.

To extract the parameter documentation from the function documentation,
the whole function documentation is parsed first.
Then the documentation paragraph for the requested parameter is
"converted" to a string and stored as the documentation for the
parameter. The string is converted by visiting and dumping all chunks of
the parsed paragraph.

When unknown doxygen commands are parsed (during the function
documentation parsing step), they are registered in a
`clang::comments::CommandTraits` object.
Visiting the unknown command requires to query the registered commands
through the `clang::comments::CommandTraits` object to get the command
name.

The bug was that the function documentation parsing and the visiting
step used 2 different `clang::comments::CommandTraits` objects. Hence
the visiting step fails (array access out of bounds) when trying to
retrieve the command names for unknown commands.

The patch moves the function documentation parsing step to the
construction of the `SymbolDocCommentVisitor` which is also responsible
for converting the parameter documentation paragraph to a string.
This way the same `clang::comments::CommandTraits` is used and the query
for unknown command names is correct.

Additional fixes:

- correct some whitespace behaviour for doxygen inline commands
- add a new token kind for the clang comment parser to distinguish
unknown "backslash" and "at" commands to correctly show them in the
clangd hover info

Related issue: clangd/clangd#2671
This adds a `noipa` function attribute to LLVM IR. This new attribute
disables any interprocedural analysis that inspects the definition of
the function. Setting this attribute is equivalent to moving the
function definition to a separate, optimizer-opaque, module.

The `noipa` attribute does *not* control inlining or outlining. Add the
`noinline` and `nooutline` attributes as well in cases where inlining
and outlining should additionally be disabled.

Revival of https://reviews.llvm.org/D101011
Discussed in https://discourse.llvm.org/t/noipa-continues/74411

LLVM portion of llvm/llvm-project#40819
This PR fixes warnings about unused variables.
…reeze without SPV_KHR_poison_freeze' (intel#22428)

Original commit
KhronosGroup/SPIRV-LLVM-Translator@a2a2774

This fixes a bug exposed by Pytorch, and I'm trying to add Pytorch
testing to our CI here.

Closes: intel#22308

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
We use 20.1.8 while the clang-format we obtained from building
intel/llvm is 23.0.0. Due to this, clang-format run locally produces
different formatting then the one proposed by CI workflow.
PR intel#22357 pointed out that sycl-pch-include.cpp was misspelled
and that sycl-pch-use.cpp had redundant comments.
  CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
…up_size

AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size
function attribute whenever reqd_work_group_size metadata is present.
setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL
uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit
the metadata but never set the function attribute, triggering the verifier
error on amdgcn targets.

Fixes CodeGenSYCL/reqd-work-group-size.cpp
CMPLRLLVM-76303

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wenju-he wenju-he requested a review from a team as a code owner June 26, 2026 00:01
@wenju-he wenju-he requested review from a team, Maetveis, bader and cperkinsintel as code owners June 26, 2026 06:34
@wenju-he wenju-he requested review from kweronsx and mmichel11 and removed request for a team June 26, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.