[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447
Open
wenju-he wants to merge 520 commits into
Open
[SYCL][AMDGPU] Set amdgpu-flat-work-group-size for SYCL reqd_work_group_size#22447wenju-he wants to merge 520 commits into
wenju-he wants to merge 520 commits into
Conversation
DGNode::UnscheduledPreds was added in a previous patch, so this patch makes use of it in the scheduler. Depending on Dir we can now schedule BottomUp or TopDown.
The default DWARF linker is parallel after #200971. Fix help message which still suggests classic DWARF linker.
HostInfoMacOSX's SharedCacheInfo used the dyld process-snapshot introspection SPIs only when <mach-o/dyld_introspection.h> was present, gating the calling code behind a compile-time macro. To avoid bifurcating the behavior based on the SDK, rather than the presence of the symbols, use dlsym to resolve them at runtime. While here, fold the duplicate dlsym of dyld_image_segment_data_ into the new, once-initialized, shared table. Assisted-by: Claude
…#189188) Add a new MachineFunctionPass (HexagonHVXSaveRemark) that emits optimization analysis remarks when HVX vector registers must be saved and restored around function calls. All HVX registers are caller-saved (Section 5.3 of the Hexagon ABI), so any HVX value live across a call requires a save/restore pair on the stack. Each HVX vector is 64 or 128 bytes, making this overhead expensive. The pass exits when remarks are not requested (-Rpass-analysis=hexagon-hvx-save) or when HVX is not enabled. A byte threshold (default 1024, tunable via -hexagon-hvx-save-threshold) filters out functions with only a small number of saves. The remarks help programmers identify call sites where inlining, hoisting, or sinking could reduce the save/restore cost.
To hopefully prevent the last failure mode that led to the job being disabled where the GitHub API failed to return results for >24 hours. Reviewers: cmtice Pull Request: llvm/llvm-project#205438
The Github API has recovered and the previous failure mode has been rectified by ensuring that branches are ready for deletion for seven days rather than 24 hours. Reviewers: cmtice Reviewed By: cmtice Pull Request: llvm/llvm-project#205439
A REDUCTION clause naming a user-defined operator (e.g., reduction(.myop.:x)) crashed in lowering: ReductionProcessor assumed the DefinedOperator clause variant always held an intrinsic operator and called std::get<IntrinsicOperator> unconditionally, which aborts for the DefinedOpName alternative. Handle DefinedOpName in the reduction clause processor, adding the clause-side counterpart to the directive handling from #190288. For a locally declared user-defined operator reduction, resolve the operator to its reduction symbol and reference the omp.declare_reduction op materialized for the declare reduction directive. The op name is now module-scoped via AbstractConverter::mangleName, on the directive and clause sides in lockstep, so reductions with the same operator spelling in different modules no longer collide. Cases that are not yet supported (reductions imported by USE association, renamed or merged operators, and declarations with multiple types) now emit a clean "not yet implemented" diagnostic instead of crashing or silently binding the wrong combiner. Support for the USE-associated and cross-module cases is a follow-up that builds on the semantic fix in #200329. Tests cover the issue's integer case and a derived-type case (both lower, with a module-scoped op name), plus the USE-associated and multiple-type cases (clean TODO). Fixes #204299 Assisted-by: Claude Opus 4.8, GPT-5.5.
LLVM CMake has the `LLVM_LIBDIR_SUFFIX` option to optionally add a suffix to the install library directory, so setting `-DLLVM_LIBDIR_SUFFIX=64` would result in a `lib64` library directory. We didn't honor the variable correctly in `libdevice`, `xpti`, `xptifw`, the driver, `sycl-jit` nor E2E testing. This option will be set by Linux distros packing the repo, so we need it to work. After this, further work is required to get Driver `lit` tests to pass with `LLVM_LIBDIR_SUFFIX` set, as the example device libraries we have are always in `lib`. Closes: intel#22355 Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
…iceCompilation.cpp (intel#22344) Signed-off-by: jinge90 <ge.jin@intel.com>
Refactor `DimOp::fold` in both memref and tensor dialects to use the existing `getConstantIndex()` helper instead of manually extracting the index via `IntegerAttr`.
This patch was a part of llvm/llvm-project#201170. I split the `icmp ptr` support from the original PR since I am worried it might not catch up for the LLVM 23 release (#201170 is blocked by #200672 for curating mixed provenance tests). I hope we can pick most of the low-hanging fruit exposed by fuzzers before the release. The released version should be able to run csmith-generated tests without obvious false positives or crashes. BTW, this patch doesn't respect the exact semantics of `icmp ptr` (i.e., truncating the address to the address width. The naming is a bit confusing...). Currently, we don't model external state in non-address bits of a pointer in llubi. So I think it is fine.
…l#22408) Fixes build breakage caused by 448b725 ("clang/Driver: Use struct type for BoundArch instead of StringRef"), which changed virtual signatures in `ToolChain.h` and related APIs. Update SYCL/CUDA driver code to use the new `BoundArch` struct type instead of `StringRef`/`const char*`: - SYCL.h/Cuda.h: update override signatures to match base class (`getDeviceLibs`, `getSupportedSanitizers`) - Driver.cpp: migrate `DeviceTargetInfo::BoundArch` field from `const char*` to `BoundArch`; fix all downstream uses including `appendSYCLDeviceLink`, `addSYCLDeviceLibs`, `CollectForEachInputs`, and `BuildJobsForActionNoCache` (fix stale `BoundArch` type-as-value references to use `BA` parameter); fix `nullptr`/`StringRef()` in `DeviceDependences::add`, `HostDependence`, and unbundling action `registerDependentActionInfo` calls - Clang.cpp: fix `getOffloadingArch()` → `.ArchName` for `StringRef` contexts; fix `doOnEachDependence` lambda param type; fix `getArgsForToolChain` calls with empty arch Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…040) To match with GCC: https://godbolt.org/z/KPKGhhenK Fixes: #203760 Assisted-by: Claude Sonnet 4.6
This makes it more consistent with the rest of the repository.
Fix missing opcodes in table of flag-setting instructions.
Reverts llvm/llvm-project#173135 and and add two new IR tests to demonstrate the impact of different atomic orderings on Dead Store Elimination(DSE). This reverts commit c8941df. Co-authored-by: Aiden Grossman <aidengrossman@google.com>
1) Return the evaluated APValue as a const pointer since it
may not be modified by callers.
2) Only return a non-nullptr from `getEvaluatedValue()` if
the APValue not absent.
Local build on Linux platform reports a compiler warning:
llvm-project/libc/utils/MPFRWrapper/MPCommon.cpp:546:15: warning:
implicit conversion loses integer precision: 'long' to 'int'
[-Wshorten-64-to-32]
546 | int mod = mpfr_get_si(value_ret_exact.value, MPFR_RNDN);
| ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
Signed-off-by: jinge90 <ge.jin@intel.com>
…22342) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This fixes an oversight in #164241.
…205477) `getAsCXXRecordDecl` will return nullptr for any dependent types. It's introduced by #192786, see llvm/llvm-project#192786 (comment) in original PR.
Use m_c_ICmp so the load can be on either side of the icmp.
Without asserts, we see failures like so:
/repo/llvm/llvm/lib/Target/Hexagon/HexagonAsmPrinter.cpp:982:43: error:
unused variable 'NextI' [-Werror,-Wunused-variable]
982 | MachineBasicBlock::const_instr_iterator NextI =
std::next(MI.getIterator());
| ^~~~~
1 error generated.
Mark NextI `maybe_unused` to address the issue.
Fixes a regression introduced by f8aa5f6.
Implement `memset`, `memcpy`, `memmove` intrinsics and their corresponding inline version. Note that the `isvolatile` argument is ignored and left for future PRs.
This PR fixes two related DWARF constant-handling bugs that were blocking each other. First, LLDB's DWARF expression evaluator in [`DWARFExpression.cpp`](https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp) handled `DW_OP_constu` and `DW_OP_consts` without going through `to_generic`. Under DWARF, these operators push a generic value: an address-sized integral value with unspecified signedness. That means the result should be truncated to the target address size (via `to_generic`). Second, LLVM already had a producer-side issue tracked as [#47431](llvm/llvm-project#47431): on 32-bit targets, LLVM could emit `DW_OP_consts` / `DW_OP_constu` for source integer constants wider than the target generic type. If LLDB were fixed alone, those producer-emitted constants would become truncated as DWARF requires, exposing incorrect debug info for wide source values. This patch fixes both sides together. ## What Changed On the LLDB consumer side: - `DW_OP_constu` now uses `to_generic`. - `DW_OP_consts` now uses `to_generic`. - The corresponding LLDB DWARF expression tests were updated to expect address-sized generic values. On the LLVM producer side: - Wide integer debug-location constants that cannot be represented by the target generic type are emitted as `DW_OP_implicit_value` instead of `DW_OP_const*`. - This preserves the source value bytes instead of relying on an address-sized DWARF generic constant. - The producer-side change is limited to complete constant values, where there are no remaining `DIExpression` operations. ## Validation Locally verified with: ```text build/tools/lldb/unittests/Expression/ExpressionTests --gtest_filter='DWARFExpression.*' 74 tests passed build/bin/llvm-lit -sv llvm/test/DebugInfo/X86/constant-loclist.ll 1 test passed ninja -C build check-lldb -j12 No unexpected failures ninja -C build check-all -j12 Completed with one unrelated local failure in Clang Tools :: clang-doc/DR-141990.cpp, caused by host warning-option output. No DebugInfo, DWARF, LLDB expression, or AsmPrinter-related failures were observed. ```
…hs (#205492) `performFusion()` and `fuseGuardedLoops()` carried two character-for-character identical tails: header-PHI migration plus latch rewiring, and the SCEV-forget / block-merge / latch-merge finalization. Extract them into `rewireFusedHeaderPHIsAndLatches()` and `finalizeFusedLoop()` and call both from each path.
It's a lightweight pass. Should always be the last SSA pass since peephole can end up making some instructions dead.
…on (#202121) This patch mainly fixes a bug with parsing of unknown doxygen commands in function parameter documentation. To extract the parameter documentation from the function documentation, the whole function documentation is parsed first. Then the documentation paragraph for the requested parameter is "converted" to a string and stored as the documentation for the parameter. The string is converted by visiting and dumping all chunks of the parsed paragraph. When unknown doxygen commands are parsed (during the function documentation parsing step), they are registered in a `clang::comments::CommandTraits` object. Visiting the unknown command requires to query the registered commands through the `clang::comments::CommandTraits` object to get the command name. The bug was that the function documentation parsing and the visiting step used 2 different `clang::comments::CommandTraits` objects. Hence the visiting step fails (array access out of bounds) when trying to retrieve the command names for unknown commands. The patch moves the function documentation parsing step to the construction of the `SymbolDocCommentVisitor` which is also responsible for converting the parameter documentation paragraph to a string. This way the same `clang::comments::CommandTraits` is used and the query for unknown command names is correct. Additional fixes: - correct some whitespace behaviour for doxygen inline commands - add a new token kind for the clang comment parser to distinguish unknown "backslash" and "at" commands to correctly show them in the clangd hover info Related issue: clangd/clangd#2671
This adds a `noipa` function attribute to LLVM IR. This new attribute disables any interprocedural analysis that inspects the definition of the function. Setting this attribute is equivalent to moving the function definition to a separate, optimizer-opaque, module. The `noipa` attribute does *not* control inlining or outlining. Add the `noinline` and `nooutline` attributes as well in cases where inlining and outlining should additionally be disabled. Revival of https://reviews.llvm.org/D101011 Discussed in https://discourse.llvm.org/t/noipa-continues/74411 LLVM portion of llvm/llvm-project#40819
This PR fixes warnings about unused variables.
…reeze without SPV_KHR_poison_freeze' (intel#22428) Original commit KhronosGroup/SPIRV-LLVM-Translator@a2a2774 This fixes a bug exposed by Pytorch, and I'm trying to add Pytorch testing to our CI here. Closes: intel#22308 Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
We use 20.1.8 while the clang-format we obtained from building intel/llvm is 23.0.0. Due to this, clang-format run locally produces different formatting then the one proposed by CI workflow.
PR intel#22357 pointed out that sycl-pch-include.cpp was misspelled and that sycl-pch-use.cpp had redundant comments.
CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
…up_size AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets. Fixes CodeGenSYCL/reqd-work-group-size.cpp CMPLRLLVM-76303 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AMDGPU verifier (added in 6794e31) requires amdgpu-flat-work-group-size function attribute whenever reqd_work_group_size metadata is present. setFunctionDeclAttributes only handled OpenCL's ReqdWorkGroupSizeAttr; SYCL uses SYCLReqdWorkGroupSizeAttr which went through CodeGenFunction.cpp to emit the metadata but never set the function attribute, triggering the verifier error on amdgcn targets.
Fixes CodeGenSYCL/reqd-work-group-size.cpp
CMPLRLLVM-76303