Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW25 2026)#22381

Merged
bb-sycl merged 1137 commits into
syclfrom
llvmspirv_pulldown
Jun 22, 2026
Merged

LLVM and SPIRV-LLVM-Translator pulldown (WW25 2026)#22381
bb-sycl merged 1137 commits into
syclfrom
llvmspirv_pulldown

Conversation

@iclsrc

@iclsrc iclsrc commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

ian-twilightcoder and others added 30 commits June 2, 2026 13:20
…nfo (#200896)

The architecture can be relevant when determining if an SDK supports a
particular triple. Record the full triples in SDKPlatformInfo instead of
all of the non-architecture triple components.

Assisted-by: Claude Code

rdar://172876443
The NCD pre-pass from #199962 only records extract blocks for external
uses that have a real user, so uses with a nullptr user were left
unscaled, making the loop in the report look profitable and get wrongly
vectorized. For a nullptr user vectorizeTree() places the extract right
after the vectorized instruction (entry block) and RAUWs the scalar, so
scale those extracts by the entry block frequency, restoring the
pre-#199962 behavior for that case.

Reviewers: 

Pull Request: llvm/llvm-project#201193
After #198964 I see a -Wcovered-switch-default warning in sysinfo.cc,
but it looks like it's uncovering something a bit worse.

Back in #147357 and #149159 it looks like we tried to fix an MSVC
warning for an uncovered `CacheUnknown` case in this switch, and removed
the initialization of the variable before the switch. The update to
Google Benchmark v1.9.5 has a different fix for this - it's handling
that specific case guarded by some MSVC version macros, but this depends
on the initialization we removed!

Add the initialization back and remove the default case, effectively
reverting #149159 and #147357.
Renames SphinxQuickstartTemplate.rst to SphinxQuickstartTemplate.md as
an isolated commit to preserve blame history. The file content is not
yet valid Markdown; the rewrite follows in a stacked PR.
Cross-references in MarkdownQuickstartTemplate.md, README.txt, and
openmp/docs/README.txt are updated accordingly.

See RFC:
https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/90840
Converts SphinxQuickstartTemplate.md from reStructuredText syntax to
MyST Markdown. Updates the overview to mention that MyST is now
preferred for new LLVM documentation, modernizes the authoring
guidelines, and replaces all RST markup constructs with their Markdown
equivalents.

The renamd landed as PR #198899.

See RFC:
https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/90840

An LLM was used to assist with the rewrite.

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.

Store-side counterpart to #165818. Stacked on top of #197619; and below
of #197861.
…12_psad_bw_512 intrinsic. (#201167)

Need to check that 512-bit vectors are enabled before using a 512-bit
intrinsic.
…… (#201190)

…ic module

createIntrinsicModuleDefinitions() only emitted definitions for array
named constants belonging to the __fortran_ieee_exceptions intrinsic
module. Array constants declared directly in the iso_fortran_env
intrinsic module -- in practice character_kinds -- were therefore only
lowered as bodyless `fir.global` external declarations at their use site
and never defined anywhere, producing an undefined reference at link
time.

This is usually hidden because scalar iso_fortran_env parameters fold to
immediates and constant-shape array accesses are folded away, so the
dangling external symbol is DCE'd before linking. It surfaces when the
address of the array genuinely escapes to runtime, e.g.:
```
    use iso_fortran_env
    integer :: i, x(1)
    do i = 1, size(character_kinds)
      x = findloc(character_kinds, character_kinds(i))
    end do
```
which fails with:

    undefined reference to `_QMiso_fortran_envECcharacter_kinds'

Fix by also processing the iso_fortran_env scope in
createIntrinsicModuleDefinitions(), so its array constants are emitted
as linkonce_odr definitions with initializers.

Note that integer_kinds/real_kinds/logical_kinds are unaffected: they
are renamed from iso_fortran_env_impl, a non-intrinsic module that is
compiled into the runtime, so their definitions already exist there.

Assisted-by: AI
… (#201206)

The second `ParseTypeFromDWARF` for t1 (after `p v2`) only fires when
t1's definition lives in a separate CU from its forward declaration:
LLDB parses the forward-decl DIE during `p v1` and a distinct definition
DIE during `p v2`. dsymutil's parallel linker collapses both into a
single DIE in the artificial type unit, so t1 is parsed once during `p
v1` and only re-resolved during `p v2`.

Drop the second-parse CHECK so the test no longer presumes a per-CU type
layout. The remaining `'t1' resolving forward declaration...` CHECK
after `p v2` still verifies what the test was designed to catch: t1's
complete-type resolution is deferred until v2 is evaluated. If LLDB
regressed to eager resolution during `p v1`, that log line would move
and the test would fail. Add a `(t1) (x = 0)` CHECK at the end to cover
the end-to-end value.
…#201194)

There is a relation between CFI and ThinLTO GUIDs that still needs to be
disentangled first. Note that we leave the `MD_unique_id` in
`FixedMetadataKinds.def` to avoid needing to re-number it later. Plus
the metadata string ("guid") itself is used by ctxprof.
On Windows, the llvm-shlib dylib build uses the llvm-nm host tool to
make all symbols visible by default. The LLVM_TOOL_LLVM_DRIVER_BUILD=ON
build would fail because $<TARGET_FILE:llvm-nm> was invalid. This change
passes the name of the symlink / executable copy as a custom property so
things work out and the llvm-nm.exe host tool can be found.
…#201183)

Adds a setElideLocations option to BytecodeWriterConfig to elide
locations during bytecode serialization. When enabled, all LocationAttrs
are mapped to UnknownLoc during numbering and writing to produce
location-invariant bytecode (e.g., for stable fingerprinting).

Another way to achieve the same thing would be to apply the
strip-debuginfo pass,
but that requires mutating the module, which in turn requires cloning
the module if one still requires the unstripped original.

Assisted-by: Antigravity / Gemini
…r GPRSize

Map GPRPair register classes to the GPRB register bank during GlobalISel
instruction selection. This is required because the introduction of HwMode-dependent
base pointer register classes (e.g. via PtrRegClassByHwMode) causes TableGen to
emit register bank checks for GPRPair variants in RISCVGenGlobalISel.inc.
Without this mapping, instruction selection crashes on unsupported classes.

To avoid assertion failures when GPRB's maximum size increases to 128-bit on RV64
due to the register pairs, update RISCVRegisterBankInfo::getInstrMapping to query
Subtarget.getXLen() for the scalar register width instead of relying on the bank's
getMaximumSize(). This matches AArch64's design pattern of mapping register pairs
(XSeqPairsClass) to GPR and resolving scalar register sizes dynamically.

This was fine previously but was exposed by the HwMode changes in
llvm/llvm-project#177073.

Pull Request: llvm/llvm-project#200510
Add lldb-mcp to LLVM_DISTRIBUTION_COMPONENTS
…201211)

Promote the lambda inside resolveWrapper to a public method on
SimpleExecutorDylibManager. This brings SimpleExecutorDylibManager into
better alignment with the NativeDylibManager implementation in the new
ORC runtime, and is a step towards allowing NativeDylibManager to be
used as a drop-in replacement for SimpleExecutorDylibManager.
While debugging flakey behavior with TestRunLocker, I noticed that is
intended to run its test once with a stop at the entry function (and
then Continues) and once where we launch to the main() loop. But we were
never exercising the stop-at-entry codepath.

This doesn't fix the flakey behavior, although that only happens with
the launch-directly-into-main() codepath; I don't get failures when I
stop at the entry point and then continue.
…1221)

If a VPExpandSCEVRecipe does not have users, there's no benefit to
expand it to VPInstructions, which then have to get cleaned up.

This also prevents DCE from removing VPInstructions pointed to by
TripCount after expansion.
This codepath is only executed as an attempt to clean up during a failed
launch, so the reversed arguments were rarely actually used.

rdar://175507620
…ng copies (#201177)

NVPTXLowerAggrCopies lowers load/store pairs of large values into a loop
of smaller copies.

However, it was incorrectly assuming that the load/store pairs it found
never alias.

This patch adds an alias check.  If the pointers may alias, we emit a
memmov, which handles overlap correctly.

CUDA reproducer:

    typedef char vec __attribute__((vector_size(256)));

    __global__ void boom(char *p) {
      *(vec *)(p + 8) = *(vec *)p;
    }
…ns (#198422)

Classic CodeGen's `SetFunctionAttributes` calls `setLinkageForGV` to force `ExternalWeakLinkage` on `__attribute__((weak))` and Darwin `weak_import` declarations.  CIR had no equivalent: weak function declarations were emitted with `ExternalLinkage` instead of `ExternalWeakLinkage`.

This adds `setLinkageForFunction` — the same weak/external-weak logic as `setLinkageForGV` — and calls it from `setFunctionAttributes`.  The underlying crash on inline forward declarations (the original motivation) is already fixed by #195257; what remains is this linkage gap.

`inline-forward-decl.c` covers `__attribute__((weak))` on an inline forward declaration; `func-linkage-weak-import.c` covers Darwin `weak_import` (→ `extern_weak` in CIR and LLVM).
…1222)

Remove the `DryRun` branch in `bundleSYCL` that emitted a stub
`OffloadBinary`. SYCL goes through the same empty-buffer path as other
offload kinds, so the special case is no longer needed.

Update `linker-wrapper-image.c` to expect the resulting `[0 x i8]
zeroinitializer` constant and a size of `0` in the register/unregister
calls.

Assisted by Claude.
…00524)

Generalizes the Windows-only Python lookup in PythonPathSetup into a
cross-platform abstraction. Adds an abstract ScriptInterpreterRuntimeLoader
with a per-language factory. The Python implementation dynamically loads Python
library into the current process.

The loader no-ops when Python is already in the process, then walks
LLDB_PYTHON_LIBRARY env override, the build-time Python
(LLDB_PYTHON_RUNTIME_LIBRARY_BUILD_PATH) and finally a platform candidate list:

- Darwin: DEVELOPER_DIR, the bundled Xcode.app, and Command Line Tools joined
  against Python3.framework. Then python.org, /opt/homebrew, and /usr/local
  joined against Python.framework. Then xcrun -f python3 and if that fails,
  libpython3.dylib as a last resort.
- Linux: libpython3.so plus descending stable-ABI SONAMEs.
- Windows: the LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME bare name (resolved via the
  loader's default search list) and the exe-relative
  LLDB_PYTHON_DLL_RELATIVE_PATH fallback (built off GetModuleFileNameW).
  Pre-mapping python3xx.dll lets the script interpreter plugin's delay-load
  thunks resolve against the already-loaded module by base name on first use.

This commit only introduces the abstraction. No existing call site is changed,
and the script interpreter plugin still hard-links Python, which are part of
two follow-up PRs.
…00530)

Drops ${Python3_LIBRARIES} from the SHARED build of
lldbPluginScriptInterpreterPython and lets undefined Python symbols
through at link time (`-undefined dynamic_lookup` on Darwin,
`--allow-shlib-undefined` on Linux; Windows keeps its existing
delay-load + import lib).

SystemInitializerFull::Initialize resolves the Python runtime loader
via ScriptInterpreterRuntimeLoader::Get(eScriptLanguagePython) and
calls Load() before initializing any plugin, so libpython is mapped
into the process before either entry point that references it: the
static script interpreter's Initialize() (which invokes Python via
the LLDB_PLUGIN_INITIALIZE loop) and the dynamic plugin's dlopen
(whose undefined references resolve against the in-process
libpython). This covers both LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS
=ON and =OFF, and keeps Windows working in static builds where the
delay-load thunks live in liblldb itself. The loader is
once_flag-cached, and errors propagate out via the existing Expected
return.

`import lldb` from a running Python works because libpython is
already mapped into the process. The Python runtime loader probes
for stable-ABI symbols (Py_IsInitialized, Py_InitializeFromConfig)
via dlsym(RTLD_DEFAULT) / GetProcAddress and, finding them, returns
success without dlopen'ing a second libpython on top.

The dynamic plugins are added as test-suite dependencies so
check-lldb-shell and check-lldb-api wait for them; they can't be
build-order deps of liblldb itself because the plugins link against
liblldb (CMake SHARED cycle).

The unit-test static variant lldbStaticScriptInterpreterPython keeps
hard-linking libpython so link-time symbol checking is preserved.
…#201152)

This came up in a test suite as a NYI, it is just emitting a
constant-backing literal for an initializer. These are specific to C, as
global compound literals have static storage duration in C. This patch,
just like classic codgen, just creates a '.compoundliteral' object as
backing for these variables, and lets us create references to them.

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenAction.cpp
This PR adds shrinking in-place for the freelist heap. This allows the
heap to reuse the place if the reallocation shrinks the size larger than
a minimal block unit.

Synthesized random action tests show that that increase heap utilization
rate from 87% to 97% percent, basically aligns with the expectation of
dlmalloc.

Assisted-by: AI tools, manually checked.
The #189222 folds NDD+Load to non-NDD when NDD memory variant not
preferred. However, this will changes DstReg from regular def to
early-clobber def, which causes "corrupted sub-interval" in
reMaterializeFor, because the OrigLI is not updated at the same time.

Fixes: https://godbolt.org/z/7n8ozz1EG

Assisted-by: Claude Sonnet 4.6
…age (#200513)

- Rework `--dry-run` in `clang-sycl-linker` so it skips all real output
    (writing bitcode, executing tools, etc.).
  - The `link:`, `sycl-module-split:`, and a new `sycl-bundle:` summary
    line are now gated on `-v` alone.
  - Tighten `sycl-bundle:` checks in `basic.ll`, `split-mode.ll`, and
    `triple.ll` to pin kind, triple, and arch (instead of just kind),
    and add `-NOT: {{.+}}` after fully-covered dry-run check groups.
  - replace the `clang-sycl-linker` + `llvm-objdump --offloading`
    round-trip with a single `--dry-run -v` invocation.
- add dedicated `non-dry-run` mode test to verify code paths not exposed
in `dry-run`.

    Assisted by Claude.
iclsrc and others added 13 commits June 4, 2026 05:07
  CONFLICT (content): Merge conflict in llvm/lib/CodeGen/TargetPassConfig.cpp
…o specialization (#22217)

- getTombstoneKey was removed from DenseMapInfo api 836bf56
  CONFLICT (content): Merge conflict in clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
  CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp
  CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
  CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExpr.cpp
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExprAgg.cpp
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenFunction.h
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/SYCL.cpp
CONFLICT (content): Merge conflict in clang/test/SemaSYCL/sycl-cconv.cpp
Travis CI was disabled on the main branch in 06bddc1c8
CI now runs entirely via GitHub Actions.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@19833602d4e5ae2
This change adds a new configuration knob that allows the user of SPIR-V
reader to provide custom address space numbering to the translator. The
user can provide an array with values for each logical SPIR-V address
space (private, local, global, generic, constant etc.) that will be used
by the translator when generating LLVM IR. The behavior of the reader
remains unchanged if user does not explicitly change the setting.

This change was motivated by a need to support compilation target that
uses different convention for address space numbering than SPIR-V.
Generally the GPU targets in LLVM (AMDGPU, NVPTX) seem to use 0 (LLVM's
default address space) as rough equivalent of generic address space.
Without this option an additional address space renumbering pass is
needed, while such pass is possible, it is somewhat fragile (need to
visit all places in module where ptr type can be used, they change as
LLVM changes) and consumes compilation time (this can't be done in
single pass due to potential type/value dependencies). Configuring this
in SPIR-V translator seems to be a simpler and cleaner solution.

The change replaces all hard-coded address space values in SPIR-V reader
with a call that resolves the logical SPIR-V address space to value
provided by the user. If the mapping was not configured the original
values are used. Additionally analogous change was made to builtin name
mangling.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@06bd1600a7dc007
@iclsrc iclsrc added the disable-lint Skip linter check step and proceed with build jobs label Jun 21, 2026
jsji and others added 6 commits June 21, 2026 19:53
Fix SemaCUDA/implicit-hd-explicit-inst-organic-caller.cu and
SemaCUDA/implicit-hd-overload-ambig-organic-caller.cu, which broke after
commit 3a203a5 ([CUDA][HIP] Defer device diagnostics in implicit
H+D
explicit instantiations, #197214).

Upstream #197214 implements the feature on a collect-then-emit deferred
diagnostics model: the traversal collects per-function diagnostics,
ClassifyImplicitHDExplicitInst() drops the errors of implicit-H+D
explicit-instantiation functions that have no organic device caller, and
a
final DDE.emitCollectedDiags() emits whatever remains exactly once.

xmain long ago replaced that model with an inline-emit one (the upstream
"Speed up deferred diagnostic emitter" change):
DeferredDiagnosticsEmitter::
checkFunc emits each function's deferred diagnostics during traversal
and
there is no emitCollectedDiags. The pulldown conflict resolution for
kept ClassifyImplicitHDExplicitInst() but dropped the (nonexistent)
emitCollectedDiags call, so the new classification ran too late: an
implicit-H+D explicit-instantiation member was emitted inline both as a
bare emitted root and via the organic caller path, producing a duplicate
error with a truncated call stack.

Adapt the feature to xmain's inline model:
- Skip the inline emission in checkFunc for implicit-H+D
explicit-instantiation
  functions so their diagnostics are not surfaced during traversal.
- In ClassifyImplicitHDExplicitInst, when an organic device caller
exists,
  surface the deferred diagnostics here with the usual call-stack notes;
  otherwise keep dropping them and emitting a trap body as before.

Intel-specific adaptation guarded with INTEL_CUSTOMIZATION.

Fixes: CMPLRLLVM-75875

This should be reverted once
https://jira.devtools.intel.com/browse/CMPLRLLVM-74216 is done to reland
[cc4ff7f](intel-restricted/applications.compilers.llvm-project@cc4ff7f).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The reapply of 9606c9f updated some code to keep the addrspacecast
llvm/llvm-project@fb743f1
. Reverted some of our previous test changes accordingly too. We should
clean up the UseAllocaASForSrets code in follow up JIRA
https://jira.devtools.intel.com/browse/CMPLRLLVM-75358 ASAP. Thanks.
…#37621)

After 459650d removed the `XFAIL: *` from
clang/test/Driver/hipspv-toolchain.hip, the test started failing because
the HIP linker-wrapper job emits `--no-lto` twice for the TY_HIP_FATBIN
case.

Upstream commit a2369b9 ("[Clang] Fix leftover use of old LTO
path")
moved the non-RDC `--no-lto` emission: it deleted the inline block that
followed `--emit-fatbin-only` and added an equivalent block earlier,
before `--host-triple`. During pulldown the early block was merged in
but
the inline block was left behind, so both fired and a stray second
`--no-lto` was inserted between `--emit-fatbin-only` and `-o`.

Remove the leftover inline block to match the upstream state. The
earlier
block (guarded by TY_HIP_FATBIN && OFK_HIP) still emits `--no-lto`
exactly
once, which is what the test now expects.

Jira: CMPLRLLVM-73247

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics are
enabled in intel/llvm clang/lib/Basic/Targets/NVPTX.h, but not in
upstream.

Enable it to fix test-e2e/WorkGroupMemory/reduction_lambda.cpp build.

This change will be upstreamed.
@jsji

jsji commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

This is ready for merge, other than cherry-pickes.

@jsji

jsji commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@intel/llvm-gatekeepers Can you help to issue a /merge. Thanks.

@sarnex

sarnex commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

/merge

@bb-sycl

bb-sycl commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Mon 22 Jun 2026 03:14:47 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

@bb-sycl

bb-sycl commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Mon 22 Jun 2026 03:26:14 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

@bb-sycl bb-sycl merged commit 850131b into sycl Jun 22, 2026
37 of 39 checks passed
@bb-sycl bb-sycl deleted the llvmspirv_pulldown branch June 22, 2026 15:26
@bader

bader commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@jsji, @sarnex, @intel/llvm-gatekeepers, this PR adds ccache-sycl directory with a lot of binary files. Is this intentional?

@jsji

jsji commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@jsji, @sarnex, @intel/llvm-gatekeepers, this PR adds ccache-sycl directory with a lot of binary files. Is this intentional?

No. This is accidentally added, to be removed in #22393. Will fix the pulldown script to avoid similar situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.