Skip to content

Add permute_1D_sparse_data Operation for Intel XPU#72

Draft
aagalleg wants to merge 7 commits into
intel:mainfrom
aagalleg:feat/permute_1d_sparse_data
Draft

Add permute_1D_sparse_data Operation for Intel XPU#72
aagalleg wants to merge 7 commits into
intel:mainfrom
aagalleg:feat/permute_1d_sparse_data

Conversation

@aagalleg

@aagalleg aagalleg commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Depends on #71

Add permute_1D_sparse_data Operation for Intel XPU

This PR introduces the permute_1D_sparse_data operator to fbgemm-xpu, enabling permutation of sparse data in jagged/1D format on Intel XPU devices.

Changes

Core Implementation

  • SYCL Kernels (permute_1d_sparse_data.cpp/h): SYCL implementation with three specialized functors for permuting lengths, indices, and weighted data, plus optimized cumsum helpers for offset computation
  • Operator Registration (ops_registry.cpp): Registers the operator with PyTorch's dispatch system using conditional schema registration to avoid conflicts
  • Python API (ops.py): Clean Python wrapper function with type hints supporting optional weights and length sum parameters

Infrastructure

  • Build System (CMakeLists.txt): Added SYCL kernel to build configuration

Testing

  • Test Suite (test_permute_1d_sparse_data.py):
    • Correctness validation with and without weights
    • Various segment configurations (uniform, mixed lengths, empty segments)
    • Edge cases (empty tensors, single segments, identity permutations)
    • Random permutations with validation
    • CPU-XPU consistency verification

cc: @manuelhsantana, @flezaalv

aagalleg added 7 commits June 18, 2026 22:37
- Add invert_permute kernel to CMake build
- Implement invert_permute Python wrapper in ops.py
- Register invert_permute operator with schema existence check
- Add torch_library.h utility for schema validation
Add SYCL/XPU kernel implementation for invert_permute operation.
Add complete test coverage for invert_permute operator on XPU
devices, covering correctness, validation, parity, and performance.

Test coverage includes:
- Correctness tests for int32/int64 with edge cases (empty, single
  element, identity, reverse, random permutations)
- Input validation tests for invalid dimensions and dtypes
- Meta function tests for torch.compile compatibility
- PyTorch opcheck validation for operator conventions
- Parametric tests with varying sizes (1 to 1M elements)
- CPU-XPU parity tests to ensure consistent results
- Performance benchmarks measuring execution time and bandwidth
- CMakeLists: add permute_1d_sparse_data.cpp to build sources
- ops.py: add Python wrapper with type hints
- ops_registry.cpp: register operator schema in fbgemm namespace
Implement SYCL/XPU kernel implementation of permute_1D_sparse_data
operator for sparse jagged/1D format data permutation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant