[BugFix][LLVM] Fix insertDeclare API mismatch for ROCm-bundled LLVM 20#19390
[BugFix][LLVM] Fix insertDeclare API mismatch for ROCm-bundled LLVM 20#19390swjng wants to merge 1 commit intoapache:mainfrom
insertDeclare API mismatch for ROCm-bundled LLVM 20#19390Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several fixes and improvements to the TVM build and compilation pipelines. It adds CMake feature detection for LLVM 20's DIBuilder::insertDeclare to handle differences between upstream and ROCm-bundled versions. In the Relax VM build, it now prefers target-specific pipelines when using the "default" configuration. The compile_relax integration for MetaSchedule was refactored to ensure a correct pass ordering, including DLight fallback scheduling. Additionally, safety checks were added to TIRX passes to handle functions without defined targets. Feedback suggests improving the target-specific pass detection in the MetaSchedule integration to support backends beyond CUDA, avoiding manual duplication of pass lists, and ensuring CMake variable state is preserved during feature detection.
| try: | ||
| from tvm.relax.backend.cuda.pipeline import ( # pylint: disable=import-outside-toplevel | ||
| library_dispatch_passes, | ||
| dataflow_lower_passes, | ||
| finalize_passes, | ||
| ) | ||
| backend_specific = True | ||
| except ImportError: | ||
| backend_specific = False |
There was a problem hiding this comment.
The logic for detecting backend_specific passes is currently tied to the ability to import CUDA-specific pipeline modules. This will fail to detect backend-specific passes for other targets like ROCm, which is the primary focus of this PR. Instead of manual imports and try-except blocks, you should leverage the target-aware helper functions already available in tvm.relax.pipeline (e.g., library_dispatch_passes, dataflow_lower_passes, and finalize_passes). These helpers internally handle the dispatching to CUDA, ROCm, and other backends based on the provided target.
cmake/utils/FindLLVM.cmake
Outdated
| set(CMAKE_REQUIRED_INCLUDES ${LLVM_INCLUDE_DIRS}) | ||
| set(CMAKE_REQUIRED_FLAGS "-std=c++17 -fno-rtti") | ||
| check_cxx_source_compiles(" | ||
| #include <llvm/IR/DIBuilder.h> | ||
| #include <llvm/IR/Instructions.h> | ||
| void f(llvm::DIBuilder* b, llvm::Value* v, llvm::DILocalVariable* var, | ||
| llvm::DIExpression* e, const llvm::DILocation* dl, | ||
| llvm::Instruction* i) { | ||
| b->insertDeclare(v, var, e, dl, llvm::BasicBlock::iterator(i)); | ||
| } | ||
| " TVM_LLVM_INSERTDECLARE_USES_ITERATOR) | ||
| unset(CMAKE_REQUIRED_INCLUDES) | ||
| unset(CMAKE_REQUIRED_FLAGS) |
There was a problem hiding this comment.
Setting and then unsetting CMAKE_REQUIRED_INCLUDES and CMAKE_REQUIRED_FLAGS can lead to side effects if these variables had values before the find_llvm macro was called. It is safer to save the original values and restore them afterwards to avoid affecting other parts of the build system.
set(OLD_CMAKE_REQUIRED_INCLUDES ${CMAKE_REQUIRED_INCLUDES})
set(OLD_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS})
set(CMAKE_REQUIRED_INCLUDES ${LLVM_INCLUDE_DIRS})
set(CMAKE_REQUIRED_FLAGS "-std=c++17 -fno-rtti")
check_cxx_source_compiles("
#include <llvm/IR/DIBuilder.h>
#include <llvm/IR/Instructions.h>
void f(llvm::DIBuilder* b, llvm::Value* v, llvm::DILocalVariable* var,
llvm::DIExpression* e, const llvm::DILocation* dl,
llvm::Instruction* i) {
b->insertDeclare(v, var, e, dl, llvm::BasicBlock::iterator(i));
}
" TVM_LLVM_INSERTDECLARE_USES_ITERATOR)
set(CMAKE_REQUIRED_INCLUDES ${OLD_CMAKE_REQUIRED_INCLUDES})
set(CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQUIRED_FLAGS})
| lower = (dataflow_lower_passes(target) if backend_specific else [ | ||
| relax.transform.RewriteDataflowReshape(), | ||
| relax.transform.ToNonDataflow(), | ||
| relax.transform.RemovePurityChecking(), | ||
| relax.transform.CallTIRRewrite(), | ||
| ]) | ||
| finalize = (finalize_passes(target) if backend_specific else [ | ||
| relax.transform.StaticPlanBlockMemory(), | ||
| relax.transform.LowerAllocTensor(), | ||
| relax.transform.KillAfterLastUse(), | ||
| relax.transform.LowerRuntimeBuiltin(), | ||
| relax.transform.ComputePrimValue(), | ||
| relax.transform.VMShapeLower(), | ||
| relax.transform.AttachGlobalSymbol(), | ||
| ]) |
There was a problem hiding this comment.
ROCm ships its own LLVM 20 build that still uses the pre-LLVM-20 `Instruction*` overload of `DIBuilder::insertDeclare`, while upstream LLVM 20 switched to `BasicBlock::iterator`. The previous version-number guard `#if TVM_LLVM_VERSION >= 200` broke ROCm users on LLVM 20. Replace the version guard with a CMake `check_cxx_source_compiles` feature probe (`TVM_LLVM_INSERTDECLARE_USES_ITERATOR`) that compiles a small test program to detect which overload is actually available. Use `cmake_push_check_state`/`cmake_pop_check_state` to avoid leaking CMake check variables. Three call sites in `codegen_llvm.cc` are updated to use the new macro. Fixes apache#18709
a7ceb05 to
70e8887
Compare
Problem
Closes #18709.
ROCm ships its own LLVM 20 build that still uses the pre-LLVM-20
Instruction*overload ofDIBuilder::insertDeclare, while upstream LLVM 20 switched toBasicBlock::iterator. The previous guard#if TVM_LLVM_VERSION >= 200incodegen_llvm.ccbroke ROCm users because their LLVM reports version 20 but lacks the new overload, causing a compile error like:```
error: no matching function for call to
'llvm::DIBuilder::insertDeclare(..., llvm::BasicBlock::iterator)'
note: candidate: insertDeclare(..., llvm::Instruction*)
```
Fix
Replace the version-number guard with a CMake
check_cxx_source_compilesfeature probe incmake/utils/FindLLVM.cmake. The probe compiles a small test program that callsinsertDeclarewithBasicBlock::iterator. The result is exposed as the preprocessor macroTVM_LLVM_INSERTDECLARE_USES_ITERATOR.cmake_push_check_state/cmake_pop_check_stateis used to avoid leakingCMAKE_REQUIRED_*variables into the rest of the build.Three call sites in
src/target/llvm/codegen_llvm.ccare updated to use#if TVM_LLVM_INSERTDECLARE_USES_ITERATORinstead of#if TVM_LLVM_VERSION >= 200.Behavior by configuration:
BasicBlock::iteratorInstruction*Instruction*Testing
Tested locally with LLVM 20.1.8 (Ubuntu package). The probe correctly detects
Failed(usesInstruction*path),libtvm.sobuilds cleanly, and all 39 LLVM codegen tests pass:```
tests/python/codegen/test_target_codegen_llvm.py 39 passed
```