Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
4790b09
Add deform conv 2d cpu execution provider support
ShirasawaSama Feb 17, 2026
abfec39
Add more tests
ShirasawaSama Feb 17, 2026
a0c5060
Add cuda support for deformconv2d
ShirasawaSama Feb 17, 2026
dd8e7f1
Improve deformconv cuda pref
ShirasawaSama Feb 19, 2026
c5bd48a
Add more test cases
ShirasawaSama Feb 19, 2026
952b3a1
Fix copilot suggestions
ShirasawaSama Feb 20, 2026
e5c043c
Fix default attrs value of DeformConv
ShirasawaSama Feb 20, 2026
eee517d
Fix schema definition for DeformConv op
ShirasawaSama Feb 23, 2026
12b19c8
Refactor DeformConv test cases
ShirasawaSama Feb 23, 2026
d6c19be
Fix OrtMemTypeCPUInput issue and add cuda error check
ShirasawaSama Mar 1, 2026
12fd042
Remove GemmEx double specialization
ShirasawaSama Mar 1, 2026
9b069e3
Fix potential integer overflow in CUDA DeformableIm2ColKernel
ShirasawaSama Mar 1, 2026
cbadf13
Optimize CPU DeformableIm2Col loop order for better cache locality
ShirasawaSama Mar 1, 2026
a951568
Parallelize CPU DeformConv Im2Col and bias addition
ShirasawaSama Mar 1, 2026
f1a9832
Use GPU free memory in DeformConv temp memory heuristic
ShirasawaSama Mar 1, 2026
d99994f
Extract DeformConvAttributes to shared header
ShirasawaSama Mar 1, 2026
7d7f66e
DeformConv op shared attributes and validation
ShirasawaSama Mar 1, 2026
8b5a13f
Refactor attributes/validation and optimize CUDA DeformConvIm2Col kernel
ShirasawaSama Mar 1, 2026
e5ec6de
Add DeformConv OnnxModelTest with reference ONNX model
ShirasawaSama Mar 2, 2026
14cf455
Optimize GetGreatestDivisorBelowBound in CUDA DeformConv
ShirasawaSama Mar 2, 2026
4121178
Document symmetric-padding-only limitation in deform_conv_test_gen
ShirasawaSama Mar 2, 2026
df9d0b1
Skip cuda DeformConv op copy kernel when cur_parallel==1
ShirasawaSama Mar 2, 2026
03cc5e5
Reformat code
ShirasawaSama Mar 2, 2026
d9f65fb
Fix cuda fp16 test cases
ShirasawaSama Mar 2, 2026
15fe856
Fix int64_t to ptrdiff_t conversion in deform_conv
ShirasawaSama Mar 5, 2026
931c386
Resolve pipeline failures caused by unit tests
ShirasawaSama Mar 6, 2026
fedd389
Add comments and handle unused variables
ShirasawaSama Mar 6, 2026
7ebc498
Address review feedback and align with Conv behavior
ShirasawaSama Mar 6, 2026
0479ade
Optimize DeformConv cpu bias add with Eigen SIMD
ShirasawaSama Mar 6, 2026
f7819f1
Document GEMM layout trick in DeformConv cuBLAS path
ShirasawaSama Mar 6, 2026
34fae7d
Use int64_t for bilinear interpolation indices
ShirasawaSama Mar 6, 2026
173fd6b
refactor(DeformConv CPU): template UseMask and improve im2col perform…
ShirasawaSama Mar 6, 2026
33e4866
perf(DeformConv CPU): optimize im2col and BilinearInterpolate
ShirasawaSama Mar 6, 2026
a482eb5
Early OOB check for BilinearInterpolate
ShirasawaSama Mar 6, 2026
b46f922
Shrink DeformConv CUDA mutex to UpdateState only
ShirasawaSama Mar 6, 2026
82d1228
Use cublasGemmStridedBatched for gemm_writes_directly path in DeformC…
ShirasawaSama Mar 6, 2026
da18ee3
Fix var name
ShirasawaSama Mar 6, 2026
dcd00c3
Drop mask==0 branch in im2col to match CPU behavior
ShirasawaSama Mar 6, 2026
6e727c2
Add 1x1 im2col kernel specialization dispatch
ShirasawaSama Mar 6, 2026
0166fa1
Reformat codes
ShirasawaSama Mar 6, 2026
f2d8f5d
Fix C4244 in deform_conv_op_test by casting rtol/atol to float
ShirasawaSama Mar 7, 2026
6ead850
Add standard MIT license header
ShirasawaSama Mar 12, 2026
7b11bad
Register DeformConv BFloat16 only for opset 22
ShirasawaSama Mar 12, 2026
4a72276
Validate DeformConv input ranks and output size before indexing shapes
ShirasawaSama Mar 12, 2026
7ba7d1b
Refactor DeformConv MinimalBilinear tests with shared data/template a…
ShirasawaSama Mar 12, 2026
432c1c6
Validate optional bias B shape (1D [M]) in DeformConv shared helper f…
ShirasawaSama Mar 12, 2026
a25c1f4
Use cached totalGlobalMem for temp budget, remove cudaMemGetInfo and …
ShirasawaSama Mar 12, 2026
d98f339
Document int indices in CUDA BilinearInterpolate
ShirasawaSama Mar 12, 2026
74a7760
Document why BFloat16 is not delegated in DeformConv CUDA impl
ShirasawaSama Mar 12, 2026
c2bf6f4
Document prime-batch fallback to single-image chunks in DeformConv Ge…
ShirasawaSama Mar 12, 2026
a8920b4
Refine deform conv test generator imports and ONNX model save usage
ShirasawaSama Mar 13, 2026
b7b4681
Optimize DeformConv CPU bilinear interpolation
ShirasawaSama Mar 13, 2026
6aeef46
Optimize DeformConv BilinearInterpolation for performance on CUDA
ShirasawaSama Mar 13, 2026
46f176c
Enforce 2D attribute lengths and validate kernel_shape/pads/overflow-…
ShirasawaSama Mar 16, 2026
7cd167b
Clarify DeformConv OnnxModelTest comment as ORT-reference smoke test
ShirasawaSama Mar 16, 2026
ada5ca3
DeformConv EmptyBatch test expects failure when batch size N is zero
ShirasawaSama Mar 16, 2026
17b155a
Allow DeformConv empty batch
ShirasawaSama Mar 16, 2026
288e4c0
Update docs
ShirasawaSama Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/OperatorKernels.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ Do not modify directly.*
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)<br/> **T2** = tensor(int32), tensor(int64)|
|DFT|*in* input:**T1**<br> *in* dft_length:**T2**<br> *in* axis:**tensor(int64)**<br> *out* output:**T1**<br><br>or<br><br>*in* input:**T1**<br> *in* dft_length:**T2**<br> *out* output:**T1**|20+|**T1** = tensor(double), tensor(float)<br/> **T2** = tensor(int32), tensor(int64)|
|||[17, 19]|**T1** = tensor(double), tensor(float)<br/> **T2** = tensor(int32), tensor(int64)|
|DeformConv|*in* X:**T**<br> *in* W:**T**<br> *in* offset:**T**<br> *in* B:**T**<br> *in* mask:**T**<br> *out* Y:**T**|22+|**T** = tensor(double), tensor(float)|
|||[19, 21]|**T** = tensor(double), tensor(float)|
|DepthToSpace|*in* input:**T**<br> *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(uint8)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(uint8)|
|||[1, 10]|**T** = tensor(double), tensor(float)|
Expand Down Expand Up @@ -697,6 +699,8 @@ Do not modify directly.*
|Crop|*in* input:**T**<br> *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|CumSum|*in* x:**T**<br> *in* axis:**T2**<br> *out* y:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)<br/> **T2** = tensor(int32), tensor(int64)|
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)<br/> **T2** = tensor(int32), tensor(int64)|
|DeformConv|*in* X:**T**<br> *in* W:**T**<br> *in* offset:**T**<br> *in* B:**T**<br> *in* mask:**T**<br> *out* Y:**T**|22+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|||[19, 21]|**T** = tensor(double), tensor(float), tensor(float16)|
|DepthToSpace|*in* input:**T**<br> *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
Expand Down
8 changes: 8 additions & 0 deletions onnxruntime/core/providers/cpu/cpu_execution_provider.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1220,6 +1220,8 @@ class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain,
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, uint8_t, Resize);
class ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 20, Scan);
class ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 20, Shape);
class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 21, float, DeformConv);
class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 21, double, DeformConv);

// Opset 20
class ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 20, 20, ConstantOfShape);
Expand Down Expand Up @@ -1316,6 +1318,8 @@ class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Ac
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Atanh);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Conv);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, ConvTranspose);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, float, DeformConv);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, double, DeformConv);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Det);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, float_float, Dropout);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, float_double, Dropout);
Expand Down Expand Up @@ -3277,6 +3281,8 @@ Status RegisterOnnxOperatorKernels(KernelRegistry& kernel_registry) {
Resize)>,
BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 20, Scan)>,
BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 20, Shape)>,
BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 21, float, DeformConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 19, 21, double, DeformConv)>,

// Opset 20
BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 20, 20,
Expand Down Expand Up @@ -3407,6 +3413,8 @@ Status RegisterOnnxOperatorKernels(KernelRegistry& kernel_registry) {
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Atanh)>,
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Conv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, ConvTranspose)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, float, DeformConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, double, DeformConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Det)>,
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, EyeLike)>,
BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, GlobalAveragePool)>,
Expand Down
Loading
Loading