LLVM and SPIRV-LLVM-Translator pulldown (WW25 2026)#22381
Merged
Conversation
…nfo (#200896) The architecture can be relevant when determining if an SDK supports a particular triple. Record the full triples in SDKPlatformInfo instead of all of the non-architecture triple components. Assisted-by: Claude Code rdar://172876443
Reviewers: Pull Request: llvm/llvm-project#201192
The NCD pre-pass from #199962 only records extract blocks for external uses that have a real user, so uses with a nullptr user were left unscaled, making the loop in the report look profitable and get wrongly vectorized. For a nullptr user vectorizeTree() places the extract right after the vectorized instruction (entry block) and RAUWs the scalar, so scale those extracts by the entry block frequency, restoring the pre-#199962 behavior for that case. Reviewers: Pull Request: llvm/llvm-project#201193
After #198964 I see a -Wcovered-switch-default warning in sysinfo.cc, but it looks like it's uncovering something a bit worse. Back in #147357 and #149159 it looks like we tried to fix an MSVC warning for an uncovered `CacheUnknown` case in this switch, and removed the initialization of the variable before the switch. The update to Google Benchmark v1.9.5 has a different fix for this - it's handling that specific case guarded by some MSVC version macros, but this depends on the initialization we removed! Add the initialization back and remove the default case, effectively reverting #149159 and #147357.
Renames SphinxQuickstartTemplate.rst to SphinxQuickstartTemplate.md as an isolated commit to preserve blame history. The file content is not yet valid Markdown; the rewrite follows in a stacked PR. Cross-references in MarkdownQuickstartTemplate.md, README.txt, and openmp/docs/README.txt are updated accordingly. See RFC: https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/90840
Converts SphinxQuickstartTemplate.md from reStructuredText syntax to MyST Markdown. Updates the overview to mention that MyST is now preferred for new LLVM documentation, modernizes the authoring guidelines, and replaces all RST markup constructs with their Markdown equivalents. The renamd landed as PR #198899. See RFC: https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/90840 An LLM was used to assist with the rewrite. --------- Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
… (#198043) This PR fixes issue llvm/llvm-project#190308
Vector types that aren't widened are split so that a single ATOMIC_STORE is issued for the entire vector at once. This enables SelectionDAG to translate vectors with type bfloat,half. Store-side counterpart to #165818. Stacked on top of #197619; and below of #197861.
…12_psad_bw_512 intrinsic. (#201167) Need to check that 512-bit vectors are enabled before using a 512-bit intrinsic.
…… (#201190)
…ic module
createIntrinsicModuleDefinitions() only emitted definitions for array
named constants belonging to the __fortran_ieee_exceptions intrinsic
module. Array constants declared directly in the iso_fortran_env
intrinsic module -- in practice character_kinds -- were therefore only
lowered as bodyless `fir.global` external declarations at their use site
and never defined anywhere, producing an undefined reference at link
time.
This is usually hidden because scalar iso_fortran_env parameters fold to
immediates and constant-shape array accesses are folded away, so the
dangling external symbol is DCE'd before linking. It surfaces when the
address of the array genuinely escapes to runtime, e.g.:
```
use iso_fortran_env
integer :: i, x(1)
do i = 1, size(character_kinds)
x = findloc(character_kinds, character_kinds(i))
end do
```
which fails with:
undefined reference to `_QMiso_fortran_envECcharacter_kinds'
Fix by also processing the iso_fortran_env scope in
createIntrinsicModuleDefinitions(), so its array constants are emitted
as linkonce_odr definitions with initializers.
Note that integer_kinds/real_kinds/logical_kinds are unaffected: they
are renamed from iso_fortran_env_impl, a non-intrinsic module that is
compiled into the runtime, so their definitions already exist there.
Assisted-by: AI
… (#201206) The second `ParseTypeFromDWARF` for t1 (after `p v2`) only fires when t1's definition lives in a separate CU from its forward declaration: LLDB parses the forward-decl DIE during `p v1` and a distinct definition DIE during `p v2`. dsymutil's parallel linker collapses both into a single DIE in the artificial type unit, so t1 is parsed once during `p v1` and only re-resolved during `p v2`. Drop the second-parse CHECK so the test no longer presumes a per-CU type layout. The remaining `'t1' resolving forward declaration...` CHECK after `p v2` still verifies what the test was designed to catch: t1's complete-type resolution is deferred until v2 is evaluated. If LLDB regressed to eager resolution during `p v1`, that log line would move and the test would fail. Add a `(t1) (x = 0)` CHECK at the end to cover the end-to-end value.
…#201194)
There is a relation between CFI and ThinLTO GUIDs that still needs to be
disentangled first. Note that we leave the `MD_unique_id` in
`FixedMetadataKinds.def` to avoid needing to re-number it later. Plus
the metadata string ("guid") itself is used by ctxprof.
On Windows, the llvm-shlib dylib build uses the llvm-nm host tool to make all symbols visible by default. The LLVM_TOOL_LLVM_DRIVER_BUILD=ON build would fail because $<TARGET_FILE:llvm-nm> was invalid. This change passes the name of the symlink / executable copy as a custom property so things work out and the llvm-nm.exe host tool can be found.
…#201183) Adds a setElideLocations option to BytecodeWriterConfig to elide locations during bytecode serialization. When enabled, all LocationAttrs are mapped to UnknownLoc during numbering and writing to produce location-invariant bytecode (e.g., for stable fingerprinting). Another way to achieve the same thing would be to apply the strip-debuginfo pass, but that requires mutating the module, which in turn requires cloning the module if one still requires the unstripped original. Assisted-by: Antigravity / Gemini
…r GPRSize Map GPRPair register classes to the GPRB register bank during GlobalISel instruction selection. This is required because the introduction of HwMode-dependent base pointer register classes (e.g. via PtrRegClassByHwMode) causes TableGen to emit register bank checks for GPRPair variants in RISCVGenGlobalISel.inc. Without this mapping, instruction selection crashes on unsupported classes. To avoid assertion failures when GPRB's maximum size increases to 128-bit on RV64 due to the register pairs, update RISCVRegisterBankInfo::getInstrMapping to query Subtarget.getXLen() for the scalar register width instead of relying on the bank's getMaximumSize(). This matches AArch64's design pattern of mapping register pairs (XSeqPairsClass) to GPR and resolving scalar register sizes dynamically. This was fine previously but was exposed by the HwMode changes in llvm/llvm-project#177073. Pull Request: llvm/llvm-project#200510
Add lldb-mcp to LLVM_DISTRIBUTION_COMPONENTS
…201211) Promote the lambda inside resolveWrapper to a public method on SimpleExecutorDylibManager. This brings SimpleExecutorDylibManager into better alignment with the NativeDylibManager implementation in the new ORC runtime, and is a step towards allowing NativeDylibManager to be used as a drop-in replacement for SimpleExecutorDylibManager.
While debugging flakey behavior with TestRunLocker, I noticed that is intended to run its test once with a stop at the entry function (and then Continues) and once where we launch to the main() loop. But we were never exercising the stop-at-entry codepath. This doesn't fix the flakey behavior, although that only happens with the launch-directly-into-main() codepath; I don't get failures when I stop at the entry point and then continue.
…1221) If a VPExpandSCEVRecipe does not have users, there's no benefit to expand it to VPInstructions, which then have to get cleaned up. This also prevents DCE from removing VPInstructions pointed to by TripCount after expansion.
This codepath is only executed as an attempt to clean up during a failed launch, so the reversed arguments were rarely actually used. rdar://175507620
…ng copies (#201177)
NVPTXLowerAggrCopies lowers load/store pairs of large values into a loop
of smaller copies.
However, it was incorrectly assuming that the load/store pairs it found
never alias.
This patch adds an alias check. If the pointers may alias, we emit a
memmov, which handles overlap correctly.
CUDA reproducer:
typedef char vec __attribute__((vector_size(256)));
__global__ void boom(char *p) {
*(vec *)(p + 8) = *(vec *)p;
}
…ns (#198422) Classic CodeGen's `SetFunctionAttributes` calls `setLinkageForGV` to force `ExternalWeakLinkage` on `__attribute__((weak))` and Darwin `weak_import` declarations. CIR had no equivalent: weak function declarations were emitted with `ExternalLinkage` instead of `ExternalWeakLinkage`. This adds `setLinkageForFunction` — the same weak/external-weak logic as `setLinkageForGV` — and calls it from `setFunctionAttributes`. The underlying crash on inline forward declarations (the original motivation) is already fixed by #195257; what remains is this linkage gap. `inline-forward-decl.c` covers `__attribute__((weak))` on an inline forward declaration; `func-linkage-weak-import.c` covers Darwin `weak_import` (→ `extern_weak` in CIR and LLVM).
…1222) Remove the `DryRun` branch in `bundleSYCL` that emitted a stub `OffloadBinary`. SYCL goes through the same empty-buffer path as other offload kinds, so the special case is no longer needed. Update `linker-wrapper-image.c` to expect the resulting `[0 x i8] zeroinitializer` constant and a size of `0` in the register/unregister calls. Assisted by Claude.
…00524) Generalizes the Windows-only Python lookup in PythonPathSetup into a cross-platform abstraction. Adds an abstract ScriptInterpreterRuntimeLoader with a per-language factory. The Python implementation dynamically loads Python library into the current process. The loader no-ops when Python is already in the process, then walks LLDB_PYTHON_LIBRARY env override, the build-time Python (LLDB_PYTHON_RUNTIME_LIBRARY_BUILD_PATH) and finally a platform candidate list: - Darwin: DEVELOPER_DIR, the bundled Xcode.app, and Command Line Tools joined against Python3.framework. Then python.org, /opt/homebrew, and /usr/local joined against Python.framework. Then xcrun -f python3 and if that fails, libpython3.dylib as a last resort. - Linux: libpython3.so plus descending stable-ABI SONAMEs. - Windows: the LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME bare name (resolved via the loader's default search list) and the exe-relative LLDB_PYTHON_DLL_RELATIVE_PATH fallback (built off GetModuleFileNameW). Pre-mapping python3xx.dll lets the script interpreter plugin's delay-load thunks resolve against the already-loaded module by base name on first use. This commit only introduces the abstraction. No existing call site is changed, and the script interpreter plugin still hard-links Python, which are part of two follow-up PRs.
…00530)
Drops ${Python3_LIBRARIES} from the SHARED build of
lldbPluginScriptInterpreterPython and lets undefined Python symbols
through at link time (`-undefined dynamic_lookup` on Darwin,
`--allow-shlib-undefined` on Linux; Windows keeps its existing
delay-load + import lib).
SystemInitializerFull::Initialize resolves the Python runtime loader
via ScriptInterpreterRuntimeLoader::Get(eScriptLanguagePython) and
calls Load() before initializing any plugin, so libpython is mapped
into the process before either entry point that references it: the
static script interpreter's Initialize() (which invokes Python via
the LLDB_PLUGIN_INITIALIZE loop) and the dynamic plugin's dlopen
(whose undefined references resolve against the in-process
libpython). This covers both LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS
=ON and =OFF, and keeps Windows working in static builds where the
delay-load thunks live in liblldb itself. The loader is
once_flag-cached, and errors propagate out via the existing Expected
return.
`import lldb` from a running Python works because libpython is
already mapped into the process. The Python runtime loader probes
for stable-ABI symbols (Py_IsInitialized, Py_InitializeFromConfig)
via dlsym(RTLD_DEFAULT) / GetProcAddress and, finding them, returns
success without dlopen'ing a second libpython on top.
The dynamic plugins are added as test-suite dependencies so
check-lldb-shell and check-lldb-api wait for them; they can't be
build-order deps of liblldb itself because the plugins link against
liblldb (CMake SHARED cycle).
The unit-test static variant lldbStaticScriptInterpreterPython keeps
hard-linking libpython so link-time symbol checking is preserved.
…#201152) This came up in a test suite as a NYI, it is just emitting a constant-backing literal for an initializer. These are specific to C, as global compound literals have static storage duration in C. This patch, just like classic codgen, just creates a '.compoundliteral' object as backing for these variables, and lets us create references to them. --------- Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenAction.cpp
This PR adds shrinking in-place for the freelist heap. This allows the heap to reuse the place if the reallocation shrinks the size larger than a minimal block unit. Synthesized random action tests show that that increase heap utilization rate from 87% to 97% percent, basically aligns with the expectation of dlmalloc. Assisted-by: AI tools, manually checked.
The #189222 folds NDD+Load to non-NDD when NDD memory variant not preferred. However, this will changes DstReg from regular def to early-clobber def, which causes "corrupted sub-interval" in reMaterializeFor, because the OrigLI is not updated at the same time. Fixes: https://godbolt.org/z/7n8ozz1EG Assisted-by: Claude Sonnet 4.6
…age (#200513)
- Rework `--dry-run` in `clang-sycl-linker` so it skips all real output
(writing bitcode, executing tools, etc.).
- The `link:`, `sycl-module-split:`, and a new `sycl-bundle:` summary
line are now gated on `-v` alone.
- Tighten `sycl-bundle:` checks in `basic.ll`, `split-mode.ll`, and
`triple.ll` to pin kind, triple, and arch (instead of just kind),
and add `-NOT: {{.+}}` after fully-covered dry-run check groups.
- replace the `clang-sycl-linker` + `llvm-objdump --offloading`
round-trip with a single `--dry-run -v` invocation.
- add dedicated `non-dry-run` mode test to verify code paths not exposed
in `dry-run`.
Assisted by Claude.
CONFLICT (content): Merge conflict in llvm/lib/CodeGen/TargetPassConfig.cpp
CONFLICT (content): Merge conflict in clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExpr.cpp CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExprAgg.cpp CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenFunction.h
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/SYCL.cpp CONFLICT (content): Merge conflict in clang/test/SemaSYCL/sycl-cconv.cpp
Travis CI was disabled on the main branch in 06bddc1c8 CI now runs entirely via GitHub Actions. Original commit: KhronosGroup/SPIRV-LLVM-Translator@19833602d4e5ae2
This change adds a new configuration knob that allows the user of SPIR-V reader to provide custom address space numbering to the translator. The user can provide an array with values for each logical SPIR-V address space (private, local, global, generic, constant etc.) that will be used by the translator when generating LLVM IR. The behavior of the reader remains unchanged if user does not explicitly change the setting. This change was motivated by a need to support compilation target that uses different convention for address space numbering than SPIR-V. Generally the GPU targets in LLVM (AMDGPU, NVPTX) seem to use 0 (LLVM's default address space) as rough equivalent of generic address space. Without this option an additional address space renumbering pass is needed, while such pass is possible, it is somewhat fragile (need to visit all places in module where ptr type can be used, they change as LLVM changes) and consumes compilation time (this can't be done in single pass due to potential type/value dependencies). Configuring this in SPIR-V translator seems to be a simpler and cleaner solution. The change replaces all hard-coded address space values in SPIR-V reader with a call that resolves the logical SPIR-V address space to value provided by the user. If the mapping was not configured the original values are used. Additionally analogous change was made to builtin name mangling. Original commit: KhronosGroup/SPIRV-LLVM-Translator@06bd1600a7dc007
Fixes: CMPLRLLVM-76051
Fix SemaCUDA/implicit-hd-explicit-inst-organic-caller.cu and SemaCUDA/implicit-hd-overload-ambig-organic-caller.cu, which broke after commit 3a203a5 ([CUDA][HIP] Defer device diagnostics in implicit H+D explicit instantiations, #197214). Upstream #197214 implements the feature on a collect-then-emit deferred diagnostics model: the traversal collects per-function diagnostics, ClassifyImplicitHDExplicitInst() drops the errors of implicit-H+D explicit-instantiation functions that have no organic device caller, and a final DDE.emitCollectedDiags() emits whatever remains exactly once. xmain long ago replaced that model with an inline-emit one (the upstream "Speed up deferred diagnostic emitter" change): DeferredDiagnosticsEmitter:: checkFunc emits each function's deferred diagnostics during traversal and there is no emitCollectedDiags. The pulldown conflict resolution for kept ClassifyImplicitHDExplicitInst() but dropped the (nonexistent) emitCollectedDiags call, so the new classification ran too late: an implicit-H+D explicit-instantiation member was emitted inline both as a bare emitted root and via the organic caller path, producing a duplicate error with a truncated call stack. Adapt the feature to xmain's inline model: - Skip the inline emission in checkFunc for implicit-H+D explicit-instantiation functions so their diagnostics are not surfaced during traversal. - In ClassifyImplicitHDExplicitInst, when an organic device caller exists, surface the deferred diagnostics here with the usual call-stack notes; otherwise keep dropping them and emitting a trap body as before. Intel-specific adaptation guarded with INTEL_CUSTOMIZATION. Fixes: CMPLRLLVM-75875 This should be reverted once https://jira.devtools.intel.com/browse/CMPLRLLVM-74216 is done to reland [cc4ff7f](intel-restricted/applications.compilers.llvm-project@cc4ff7f). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The reapply of 9606c9f updated some code to keep the addrspacecast llvm/llvm-project@fb743f1 . Reverted some of our previous test changes accordingly too. We should clean up the UseAllocaASForSrets code in follow up JIRA https://jira.devtools.intel.com/browse/CMPLRLLVM-75358 ASAP. Thanks.
…#37621) After 459650d removed the `XFAIL: *` from clang/test/Driver/hipspv-toolchain.hip, the test started failing because the HIP linker-wrapper job emits `--no-lto` twice for the TY_HIP_FATBIN case. Upstream commit a2369b9 ("[Clang] Fix leftover use of old LTO path") moved the non-RDC `--no-lto` emission: it deleted the inline block that followed `--emit-fatbin-only` and added an equivalent block earlier, before `--host-triple`. During pulldown the early block was merged in but the inline block was left behind, so both fired and a stray second `--no-lto` was inserted between `--emit-fatbin-only` and `-o`. Remove the leftover inline block to match the upstream state. The earlier block (guarded by TY_HIP_FATBIN && OFK_HIP) still emits `--no-lto` exactly once, which is what the test now expects. Jira: CMPLRLLVM-73247 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics are enabled in intel/llvm clang/lib/Basic/Targets/NVPTX.h, but not in upstream. Enable it to fix test-e2e/WorkGroupMemory/reduction_lambda.cpp build. This change will be upstreamed.
Contributor
|
This is ready for merge, other than cherry-pickes.
|
Contributor
|
@intel/llvm-gatekeepers Can you help to issue a /merge. Thanks. |
Contributor
|
/merge |
bb-sycl
approved these changes
Jun 22, 2026
Contributor
|
Mon 22 Jun 2026 03:14:47 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes. |
Contributor
|
Mon 22 Jun 2026 03:26:14 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later. |
Contributor
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@375fa6f
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@bd15d75