Skip to content

Add permute_2D_sparse_data operator with SYCL implementation for XPU#74

Draft
aagalleg wants to merge 15 commits into
intel:mainfrom
aagalleg:feat/permute_2d_sparse_data
Draft

Add permute_2D_sparse_data operator with SYCL implementation for XPU#74
aagalleg wants to merge 15 commits into
intel:mainfrom
aagalleg:feat/permute_2d_sparse_data

Conversation

@aagalleg

Copy link
Copy Markdown
Contributor

Depends on: #73

This PR introduces the permute_2D_sparse_data operator to fbgemm-xpu, enabling 2D sparse data permutation on Intel XPU devices.

Changes

Core Implementation

  • SYCL Kernels (permute_2d_sparse_data.cpp/h): SYCL implementation that permutes 2D sparse data (lengths [T, B], indices, and optional weights) according to a permutation vector
  • Host Function (permute_2d_sparse_dataOp.cpp): Top-level operator function with support for weighted and unweighted sparse data
  • Operator Registration (ops_registry.cpp): Registers the operator with PyTorch's dispatch system using conditional schema registration to avoid conflicts
  • Python API (ops.py): Clean Python wrapper function with type hints for easy integration

Infrastructure

  • Build System (CMakeLists.txt): Added SYCL kernels and utility files to build configuration
  • SYCL Communication Utilities (fbgemm_utils/comm/): Device properties, SYCL context management, and helper functions from intel-sandbox/custom_operator_xpu
  • Core Utilities (fbgemm_utils/): Essential infrastructure including:
    • utils.h/cpp: Kernel launch helpers, device queries, and type definitions
    • dispatch_macros.h: Type dispatch macros for multiple data types
    • tensor_utils.h: Tensor manipulation and metadata utilities
    • function_types.h: Symbol visibility definitions

Testing

  • Comprehensive Test Suite (test_permute_2d_sparse_data.py):
    • Correctness validation for int32/int64 data types
    • Sparse data with and without weights
    • Permutations with repeated indices
    • Exact value validation
    • CPU-XPU consistency verification

cc: @flezaalv, @manuelhsantana

aagalleg added 15 commits June 18, 2026 22:37
- Add invert_permute kernel to CMake build
- Implement invert_permute Python wrapper in ops.py
- Register invert_permute operator with schema existence check
- Add torch_library.h utility for schema validation
Add SYCL/XPU kernel implementation for invert_permute operation.
Add complete test coverage for invert_permute operator on XPU
devices, covering correctness, validation, parity, and performance.

Test coverage includes:
- Correctness tests for int32/int64 with edge cases (empty, single
  element, identity, reverse, random permutations)
- Input validation tests for invalid dimensions and dtypes
- Meta function tests for torch.compile compatibility
- PyTorch opcheck validation for operator conventions
- Parametric tests with varying sizes (1 to 1M elements)
- CPU-XPU parity tests to ensure consistent results
- Performance benchmarks measuring execution time and bandwidth
- CMakeLists: add permute_1d_sparse_data.cpp to build sources
- ops.py: add Python wrapper with type hints
- ops_registry.cpp: register operator schema in fbgemm namespace
Implement SYCL/XPU kernel implementation of permute_1D_sparse_data
operator for sparse jagged/1D format data permutation.
Add SYCL port of FBGEMM's asynchronous_complete_cumsum operator for
Intel XPU devices. The operator computes a complete cumulative sum
with a leading zero (e.g., [a, b, c] → [0, a, a+b, a+b+c]).
Integrate asynchronous_complete_cumsum operator into fbgemm-xpu:
- Add Python wrapper with complete cumsum documentation
- Register operator schema in torch library
- Include implementation in CMake build
Add comprehensive test suite for asynchronous_complete_cumsum
operator covering:
- Basic functionality with int32 and int64 dtypes
- Empty tensor handling
- Random input validation with numpy reference
Add SYCL infrastructure headers from intel/torch-xpu-ops/
to support advanced kernel implementations:
- DeviceProperties.h: Device capability queries and work group sizing
- SYCLContext.h: SYCL context management and namespace aliases
- SYCLHelpers.h: SYCL kernel submission and utility functions
- TensorInfo.h: Tensor metadata and dimension handling structures
- TensorOptions.h: Tensor configuration and options management
- Runtime.h: SYCL runtime utilities
- Macros.h: Common macro definitions
- Scalar.h: Scalar type conversion utilities

These headers provide the foundation for implementing 2D sparse data
permutation and other complex SYCL operations on XPU devices.
Add foundational utility headers and implementations to support
complex SYCL kernel operations:
- utils.h/cpp: Core constants, type definitions, kernel launch
  helpers, and device property queries
- dispatch_macros.h: Type dispatch macros for handling multiple
  data types (int32, int64, float, etc.)
- tensor_utils.h: Tensor manipulation and metadata utilities
- function_types.h: Symbol visibility definitions for shared
  library exports

These utilities provide essential infrastructure for implementing
2D sparse data permutation and other advanced operators on XPU
devices, including work group sizing, kernel launch helpers, and
type-safe dispatching mechanisms.
Add SYCL port of FBGEMM's permute_2D_sparse_data operator for
Intel XPU devices. This operator permutes 2D sparse data including
lengths [T, B], indices, and optional weights according to a
permutation vector, commonly used for reordering embedding table
features.

Implementation includes:
- SYCL kernels: permute_2D_lengths_kernel and permute_2D_data_kernel
- Host function: permute_2D_sparse_data_xpu
Integrate permute_2D_sparse_data operator into fbgemm-xpu:
- Add Python wrapper with type hints and documentation
- Register operator schema in torch library
- Include implementation files in CMake build (utils.cpp, SYCL
  kernels, and operator implementation)
Add comprehensive test suite for permute_2D_sparse_data operator
covering:
- Basic functionality with int32 and int64 data types
- Sparse data with and without weights
- Permutations with repeated indices
- Exact value validation
- CPU-XPU consistency verification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant