Add block_bucketize_sparse_features operator#76
Draft
flezaalv wants to merge 17 commits into
Draft
Conversation
Remove .gitkeep placeholders
- Add invert_permute kernel to CMake build - Implement invert_permute Python wrapper in ops.py - Register invert_permute operator with schema existence check - Add torch_library.h utility for schema validation
Add SYCL/XPU kernel implementation for invert_permute operation.
Add complete test coverage for invert_permute operator on XPU devices, covering correctness, validation, parity, and performance. Test coverage includes: - Correctness tests for int32/int64 with edge cases (empty, single element, identity, reverse, random permutations) - Input validation tests for invalid dimensions and dtypes - Meta function tests for torch.compile compatibility - PyTorch opcheck validation for operator conventions - Parametric tests with varying sizes (1 to 1M elements) - CPU-XPU parity tests to ensure consistent results - Performance benchmarks measuring execution time and bandwidth
- CMakeLists: add permute_1d_sparse_data.cpp to build sources - ops.py: add Python wrapper with type hints - ops_registry.cpp: register operator schema in fbgemm namespace
Implement SYCL/XPU kernel implementation of permute_1D_sparse_data operator for sparse jagged/1D format data permutation.
Add SYCL port of FBGEMM's asynchronous_complete_cumsum operator for Intel XPU devices. The operator computes a complete cumulative sum with a leading zero (e.g., [a, b, c] → [0, a, a+b, a+b+c]).
Integrate asynchronous_complete_cumsum operator into fbgemm-xpu: - Add Python wrapper with complete cumsum documentation - Register operator schema in torch library - Include implementation in CMake build
Add comprehensive test suite for asynchronous_complete_cumsum operator covering: - Basic functionality with int32 and int64 dtypes - Empty tensor handling - Random input validation with numpy reference
Add SYCL infrastructure headers from intel/torch-xpu-ops/ to support advanced kernel implementations: - DeviceProperties.h: Device capability queries and work group sizing - SYCLContext.h: SYCL context management and namespace aliases - SYCLHelpers.h: SYCL kernel submission and utility functions - TensorInfo.h: Tensor metadata and dimension handling structures - TensorOptions.h: Tensor configuration and options management - Runtime.h: SYCL runtime utilities - Macros.h: Common macro definitions - Scalar.h: Scalar type conversion utilities These headers provide the foundation for implementing 2D sparse data permutation and other complex SYCL operations on XPU devices.
Add foundational utility headers and implementations to support complex SYCL kernel operations: - utils.h/cpp: Core constants, type definitions, kernel launch helpers, and device property queries - dispatch_macros.h: Type dispatch macros for handling multiple data types (int32, int64, float, etc.) - tensor_utils.h: Tensor manipulation and metadata utilities - function_types.h: Symbol visibility definitions for shared library exports These utilities provide essential infrastructure for implementing 2D sparse data permutation and other advanced operators on XPU devices, including work group sizing, kernel launch helpers, and type-safe dispatching mechanisms.
Add SYCL port of FBGEMM's permute_2D_sparse_data operator for Intel XPU devices. This operator permutes 2D sparse data including lengths [T, B], indices, and optional weights according to a permutation vector, commonly used for reordering embedding table features. Implementation includes: - SYCL kernels: permute_2D_lengths_kernel and permute_2D_data_kernel - Host function: permute_2D_sparse_data_xpu
Integrate permute_2D_sparse_data operator into fbgemm-xpu: - Add Python wrapper with type hints and documentation - Register operator schema in torch library - Include implementation files in CMake build (utils.cpp, SYCL kernels, and operator implementation)
Add comprehensive test suite for permute_2D_sparse_data operator covering: - Basic functionality with int32 and int64 data types - Sparse data with and without weights - Permutations with repeated indices - Exact value validation - CPU-XPU consistency verification
Fixes CMake configuration and import ordering to properly build and load the block_bucketize_sparse_features XPU operator. - Configure CMake for XPU-only PyTorch builds - Import torch before _C extension to load libtorch.so dependencies - Adjust test imports for consistency All 18 tests passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the integration of the
block_bucketize_sparse_featuresoperator by fixing CMake configuration and resolving dynamic library loading issues.Depends on: #74
Changes
Module Initialization (
src/fbgemm_xpu/__init__.py)torchbefore loading_Cextension to ensure PyTorch shared libraries are loaded in the process address spaceTest Suite (
test/test_block_bucketize_sparse_features.py)Testing
All 18 test cases pass successfully:
$ pytest packages/fbgemm-xpu/test/test_block_bucketize_sparse_features.py -vv ============================== 18 passed in 3.97s ==============================cc: @manuelhsantana, @aagalleg