Add invert_permute operator with SYCL implementation for XPU#71
Add invert_permute operator with SYCL implementation for XPU#71aagalleg wants to merge 4 commits into
Conversation
Remove .gitkeep placeholders
- Add invert_permute kernel to CMake build - Implement invert_permute Python wrapper in ops.py - Register invert_permute operator with schema existence check - Add torch_library.h utility for schema validation
Add SYCL/XPU kernel implementation for invert_permute operation.
Add complete test coverage for invert_permute operator on XPU devices, covering correctness, validation, parity, and performance. Test coverage includes: - Correctness tests for int32/int64 with edge cases (empty, single element, identity, reverse, random permutations) - Input validation tests for invalid dimensions and dtypes - Meta function tests for torch.compile compatibility - PyTorch opcheck validation for operator conventions - Parametric tests with varying sizes (1 to 1M elements) - CPU-XPU parity tests to ensure consistent results - Performance benchmarks measuring execution time and bandwidth
dvrogozh
left a comment
There was a problem hiding this comment.
In addition to inline comments:
- Repository has lint CI - please, address identified issues.
- What we implement and merge must be tested under CI. And IMHO doing that by engineers who implemented the code is the best variant rather than offloading that to dedicated CI engineers. Thus, please, modify
.github/workflows/ci.ymland add a test section for FBGEMM following the pattern outlined for torchcodec (see link below). The test job for FBGEMM will be much simpler. Target onlybmgfor now is fine. Note that fbgemm wheel is already getting build by CI.
torchlib-xpu/.github/workflows/ci.yml
Line 199 in d4230aa
| .has_value(); | ||
| } | ||
|
|
||
| } // namespace fbgemm_xpu::utils::torch No newline at end of file |
There was a problem hiding this comment.
Add empty line in the end of file.
| // SYCL Kernel Functor Implementations | ||
| // ============================================================================ | ||
|
|
||
| void InvertPermuteKernelInt32::operator()(const sycl::nd_item<1>& item) const { |
There was a problem hiding this comment.
This seems templetable kernel over the operand type. Why don't we do that especially considering that the reference CUDA kernel is doing exactly that [1]?
|
|
||
| __all__ = [ | ||
| "dense_embedding_codegen_lookup_function", | ||
| "invert_permute", |
There was a problem hiding this comment.
Here I am confused. Why do we expose these XPU specific APIs from fbgemm xpu? who is supposed to use them? My understanding was that we are just register low level entrypoints for XPU key so that standard fbgemm APIs start to work on XPU device. Shouldn't that be that user calls fbgemm::invert_permute for example?
This PR introduces the
invert_permuteoperator to fbgemm-xpu, enabling inverse permutations on Intel XPU devices.Changes
Core Implementation
invert_permute_kernel.cpp/h): High-performance SYCL implementation that computes inverse permutations on XPU devicesops_registry.cpp): Registers the operator with PyTorch's dispatch system using conditional schema registration to avoid conflictsops.py): Clean Python wrapper function with type hints for easy integrationInfrastructure
CMakeLists.txt): Added SYCL kernel to build configurationtorch_library.h): Helper utilities for schema existence checking and operator registration.gitkeepfiles from populated directoriesTesting
test_invert_permute.py):cc: @manuelhsantana, @flezaalv