Add block_bucketize_sparse_features operator by flezaalv · Pull Request #76 · intel/torchlib-xpu

flezaalv · 2026-06-25T20:33:53Z

Summary

Completes the integration of the block_bucketize_sparse_features operator by fixing CMake configuration and resolving dynamic library loading issues.

Depends on: #74

Changes

Module Initialization (`src/fbgemm_xpu/init.py`)

Import torch before loading _C extension to ensure PyTorch shared libraries are loaded in the process address space
Resolves undefined symbol errors when loading the compiled SYCL extension

Test Suite (`test/test_block_bucketize_sparse_features.py`)

Updated import order for consistency

Testing

All 18 test cases pass successfully:

$ pytest packages/fbgemm-xpu/test/test_block_bucketize_sparse_features.py -vv
============================== 18 passed in 3.97s ==============================

cc: @manuelhsantana, @aagalleg

Remove .gitkeep placeholders

- Add invert_permute kernel to CMake build - Implement invert_permute Python wrapper in ops.py - Register invert_permute operator with schema existence check - Add torch_library.h utility for schema validation

Add SYCL/XPU kernel implementation for invert_permute operation.

Add complete test coverage for invert_permute operator on XPU devices, covering correctness, validation, parity, and performance. Test coverage includes: - Correctness tests for int32/int64 with edge cases (empty, single element, identity, reverse, random permutations) - Input validation tests for invalid dimensions and dtypes - Meta function tests for torch.compile compatibility - PyTorch opcheck validation for operator conventions - Parametric tests with varying sizes (1 to 1M elements) - CPU-XPU parity tests to ensure consistent results - Performance benchmarks measuring execution time and bandwidth

- CMakeLists: add permute_1d_sparse_data.cpp to build sources - ops.py: add Python wrapper with type hints - ops_registry.cpp: register operator schema in fbgemm namespace

Implement SYCL/XPU kernel implementation of permute_1D_sparse_data operator for sparse jagged/1D format data permutation.

Add SYCL port of FBGEMM's asynchronous_complete_cumsum operator for Intel XPU devices. The operator computes a complete cumulative sum with a leading zero (e.g., [a, b, c] → [0, a, a+b, a+b+c]).

Integrate asynchronous_complete_cumsum operator into fbgemm-xpu: - Add Python wrapper with complete cumsum documentation - Register operator schema in torch library - Include implementation in CMake build

Add comprehensive test suite for asynchronous_complete_cumsum operator covering: - Basic functionality with int32 and int64 dtypes - Empty tensor handling - Random input validation with numpy reference

Add SYCL infrastructure headers from intel/torch-xpu-ops/ to support advanced kernel implementations: - DeviceProperties.h: Device capability queries and work group sizing - SYCLContext.h: SYCL context management and namespace aliases - SYCLHelpers.h: SYCL kernel submission and utility functions - TensorInfo.h: Tensor metadata and dimension handling structures - TensorOptions.h: Tensor configuration and options management - Runtime.h: SYCL runtime utilities - Macros.h: Common macro definitions - Scalar.h: Scalar type conversion utilities These headers provide the foundation for implementing 2D sparse data permutation and other complex SYCL operations on XPU devices.

Add foundational utility headers and implementations to support complex SYCL kernel operations: - utils.h/cpp: Core constants, type definitions, kernel launch helpers, and device property queries - dispatch_macros.h: Type dispatch macros for handling multiple data types (int32, int64, float, etc.) - tensor_utils.h: Tensor manipulation and metadata utilities - function_types.h: Symbol visibility definitions for shared library exports These utilities provide essential infrastructure for implementing 2D sparse data permutation and other advanced operators on XPU devices, including work group sizing, kernel launch helpers, and type-safe dispatching mechanisms.

Add SYCL port of FBGEMM's permute_2D_sparse_data operator for Intel XPU devices. This operator permutes 2D sparse data including lengths [T, B], indices, and optional weights according to a permutation vector, commonly used for reordering embedding table features. Implementation includes: - SYCL kernels: permute_2D_lengths_kernel and permute_2D_data_kernel - Host function: permute_2D_sparse_data_xpu

Integrate permute_2D_sparse_data operator into fbgemm-xpu: - Add Python wrapper with type hints and documentation - Register operator schema in torch library - Include implementation files in CMake build (utils.cpp, SYCL kernels, and operator implementation)

Add comprehensive test suite for permute_2D_sparse_data operator covering: - Basic functionality with int32 and int64 data types - Sparse data with and without weights - Permutations with repeated indices - Exact value validation - CPU-XPU consistency verification

Fixes CMake configuration and import ordering to properly build and load the block_bucketize_sparse_features XPU operator. - Configure CMake for XPU-only PyTorch builds - Import torch before _C extension to load libtorch.so dependencies - Adjust test imports for consistency All 18 tests passing.

aagalleg and others added 17 commits June 18, 2026 22:37

chore(fbgemm-xpu): remove .gitkeep files from populated directories

081cc88

Remove .gitkeep placeholders

feat(fbgemm-xpu): register invert_permute operation

07bfb4b

- Add invert_permute kernel to CMake build - Implement invert_permute Python wrapper in ops.py - Register invert_permute operator with schema existence check - Add torch_library.h utility for schema validation

feat(fbgemm-xpu): add SYCL kernel implementation for invert_permute

d6d6bad

Add SYCL/XPU kernel implementation for invert_permute operation.

feat(fbgemm-xpu): register permute_1D_sparse_data operator

3056382

- CMakeLists: add permute_1d_sparse_data.cpp to build sources - ops.py: add Python wrapper with type hints - ops_registry.cpp: register operator schema in fbgemm namespace

feat(fbgemm-xpu): add SYCL kernels for permute_1D_sparse_data

b7f3bbf

Implement SYCL/XPU kernel implementation of permute_1D_sparse_data operator for sparse jagged/1D format data permutation.

test(fbgemm-xpu): add comprehensive tests for permute_1D_sparse_data

fd70ab5

feat(fbgemm-xpu): add asynchronous_complete_cumsum XPU operator

50f9982

Add SYCL port of FBGEMM's asynchronous_complete_cumsum operator for Intel XPU devices. The operator computes a complete cumulative sum with a leading zero (e.g., [a, b, c] → [0, a, a+b, a+b+c]).

feat(fbgemm-xpu): register asynchronous_complete_cumsum operator

79272c9

Integrate asynchronous_complete_cumsum operator into fbgemm-xpu: - Add Python wrapper with complete cumsum documentation - Register operator schema in torch library - Include implementation in CMake build

test(fbgemm-xpu): add tests for asynchronous_complete_cumsum

389845d

Add comprehensive test suite for asynchronous_complete_cumsum operator covering: - Basic functionality with int32 and int64 dtypes - Empty tensor handling - Random input validation with numpy reference

Initial version of block_bucketize_sparse_features operator refactor

9e251da

flezaalv changed the title ~~Flezaalv/feat/block bucketize sparse features operator~~ Add block_bucketize_sparse_features operator Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add block_bucketize_sparse_features operator#76

Add block_bucketize_sparse_features operator#76
flezaalv wants to merge 17 commits into
intel:mainfrom
aagalleg:flezaalv/feat/block_bucketize_sparse_features_operator

flezaalv commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

flezaalv commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Module Initialization (src/fbgemm_xpu/__init__.py)

Test Suite (test/test_block_bucketize_sparse_features.py)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flezaalv commented Jun 25, 2026 •

edited

Loading

Module Initialization (`src/fbgemm_xpu/init.py`)

Test Suite (`test/test_block_bucketize_sparse_features.py`)