Skip to content

Add block_bucketize_sparse_features operator#76

Draft
flezaalv wants to merge 17 commits into
intel:mainfrom
aagalleg:flezaalv/feat/block_bucketize_sparse_features_operator
Draft

Add block_bucketize_sparse_features operator#76
flezaalv wants to merge 17 commits into
intel:mainfrom
aagalleg:flezaalv/feat/block_bucketize_sparse_features_operator

Conversation

@flezaalv

@flezaalv flezaalv commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Completes the integration of the block_bucketize_sparse_features operator by fixing CMake configuration and resolving dynamic library loading issues.

Depends on: #74

Changes

Module Initialization (src/fbgemm_xpu/__init__.py)

  • Import torch before loading _C extension to ensure PyTorch shared libraries are loaded in the process address space
  • Resolves undefined symbol errors when loading the compiled SYCL extension

Test Suite (test/test_block_bucketize_sparse_features.py)

  • Updated import order for consistency

Testing

All 18 test cases pass successfully:

$ pytest packages/fbgemm-xpu/test/test_block_bucketize_sparse_features.py -vv
============================== 18 passed in 3.97s ==============================

cc: @manuelhsantana, @aagalleg

aagalleg and others added 17 commits June 18, 2026 22:37
- Add invert_permute kernel to CMake build
- Implement invert_permute Python wrapper in ops.py
- Register invert_permute operator with schema existence check
- Add torch_library.h utility for schema validation
Add SYCL/XPU kernel implementation for invert_permute operation.
Add complete test coverage for invert_permute operator on XPU
devices, covering correctness, validation, parity, and performance.

Test coverage includes:
- Correctness tests for int32/int64 with edge cases (empty, single
  element, identity, reverse, random permutations)
- Input validation tests for invalid dimensions and dtypes
- Meta function tests for torch.compile compatibility
- PyTorch opcheck validation for operator conventions
- Parametric tests with varying sizes (1 to 1M elements)
- CPU-XPU parity tests to ensure consistent results
- Performance benchmarks measuring execution time and bandwidth
- CMakeLists: add permute_1d_sparse_data.cpp to build sources
- ops.py: add Python wrapper with type hints
- ops_registry.cpp: register operator schema in fbgemm namespace
Implement SYCL/XPU kernel implementation of permute_1D_sparse_data
operator for sparse jagged/1D format data permutation.
Add SYCL port of FBGEMM's asynchronous_complete_cumsum operator for
Intel XPU devices. The operator computes a complete cumulative sum
with a leading zero (e.g., [a, b, c] → [0, a, a+b, a+b+c]).
Integrate asynchronous_complete_cumsum operator into fbgemm-xpu:
- Add Python wrapper with complete cumsum documentation
- Register operator schema in torch library
- Include implementation in CMake build
Add comprehensive test suite for asynchronous_complete_cumsum
operator covering:
- Basic functionality with int32 and int64 dtypes
- Empty tensor handling
- Random input validation with numpy reference
Add SYCL infrastructure headers from intel/torch-xpu-ops/
to support advanced kernel implementations:
- DeviceProperties.h: Device capability queries and work group sizing
- SYCLContext.h: SYCL context management and namespace aliases
- SYCLHelpers.h: SYCL kernel submission and utility functions
- TensorInfo.h: Tensor metadata and dimension handling structures
- TensorOptions.h: Tensor configuration and options management
- Runtime.h: SYCL runtime utilities
- Macros.h: Common macro definitions
- Scalar.h: Scalar type conversion utilities

These headers provide the foundation for implementing 2D sparse data
permutation and other complex SYCL operations on XPU devices.
Add foundational utility headers and implementations to support
complex SYCL kernel operations:
- utils.h/cpp: Core constants, type definitions, kernel launch
  helpers, and device property queries
- dispatch_macros.h: Type dispatch macros for handling multiple
  data types (int32, int64, float, etc.)
- tensor_utils.h: Tensor manipulation and metadata utilities
- function_types.h: Symbol visibility definitions for shared
  library exports

These utilities provide essential infrastructure for implementing
2D sparse data permutation and other advanced operators on XPU
devices, including work group sizing, kernel launch helpers, and
type-safe dispatching mechanisms.
Add SYCL port of FBGEMM's permute_2D_sparse_data operator for
Intel XPU devices. This operator permutes 2D sparse data including
lengths [T, B], indices, and optional weights according to a
permutation vector, commonly used for reordering embedding table
features.

Implementation includes:
- SYCL kernels: permute_2D_lengths_kernel and permute_2D_data_kernel
- Host function: permute_2D_sparse_data_xpu
Integrate permute_2D_sparse_data operator into fbgemm-xpu:
- Add Python wrapper with type hints and documentation
- Register operator schema in torch library
- Include implementation files in CMake build (utils.cpp, SYCL
  kernels, and operator implementation)
Add comprehensive test suite for permute_2D_sparse_data operator
covering:
- Basic functionality with int32 and int64 data types
- Sparse data with and without weights
- Permutations with repeated indices
- Exact value validation
- CPU-XPU consistency verification
Fixes CMake configuration and import ordering to properly build and load
the block_bucketize_sparse_features XPU operator.

- Configure CMake for XPU-only PyTorch builds
- Import torch before _C extension to load libtorch.so dependencies
- Adjust test imports for consistency

All 18 tests passing.
@flezaalv flezaalv changed the title Flezaalv/feat/block bucketize sparse features operator Add block_bucketize_sparse_features operator Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants