-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Labels
NumPy 2.x ComplianceAligns behavior with NumPy 2.x (NEPs, breaking changes)Aligns behavior with NumPy 2.x (NEPs, breaking changes)coreInternal engine: Shape, Storage, TensorEngine, iteratorsInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestNew feature or requestperformancePerformance improvements or optimizationsPerformance improvements or optimizations
Description
Summary
NumSharp contains approximately ~2,700 NPTypeCode switch/case occurrences across 66 files, resulting in ~5,700 lines of repetitive type-dispatched code. This issue tracks the migration of these patterns to IL-generated kernels, reducing code size, improving maintainability, and enabling SIMD optimization.
Problem Statement
The current codebase uses extensive switch (typecode) { case NPTypeCode.X: ... } patterns to handle NumSharp's 12 supported types:
Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal
This results in:
- Code bloat: 12 nearly-identical branches per operation
- Maintenance burden: Changes must be replicated across all type branches
- Regen dependency: Many files use
#if _REGENtemplate generation - Missed SIMD opportunities: Scalar loops where vectorization is possible
High-Impact Files
| File | NPTypeCode Cases | Category |
|---|---|---|
Utilities/Converts.cs |
516 | Type Conversion |
UnmanagedMemoryBlock.Casting.cs |
342 | Type Casting |
Utilities/ArrayConvert.cs |
221 | Array Conversion |
Backends/NPTypeCode.cs |
161 | Extension Methods |
Unmanaged/ArraySlice.cs |
130 | Slice Operations |
DefaultEngine.ReductionOp.cs |
69 | Reductions |
Default.ClipNDArray.cs |
66 | Clip with NDArray |
UnmanagedStorage.cs |
52 | Storage Operations |
Migration Priority
P0: Type Casting (Est. 4000 LOC reduction)
UnmanagedMemoryBlock.Casting.cs- 12×12 nested switch, 291 for-loopsArrayConvert.cs- 12×12 nested switch, 172 for-loops- Target: Single IL kernel per type-pair, SIMD widening/narrowing
P1: Indexing Operations (Est. 600 LOC reduction)
NDArray.Indexing.Selection.Getter.cs- 12-type dispatchNDArray.Indexing.Selection.Setter.cs- 12-type dispatch- Target: IL gather/scatter kernels
P2: Math Operations (Est. 400 LOC reduction)
np.linspace.cs- 12 per-type loops → IL sequence generation with SIMDnp.repeat.cs- 12 per-type loops → IL fill kernel with SIMDnp.all.cs/np.any.csaxis path → IL axis reduction with early-exit
P3: Reduction Fallbacks (Est. 200 LOC reduction)
Default.Reduction.CumAdd.cs- 10-type fallback switchDefault.Reduction.CumMul.cs- 10-type fallback switch
P4: Dispatch Cleanup (Est. 500 LOC reduction)
Files that already have IL kernels but retain verbose type dispatch:
Default.Clip.cs- 3 × 11-type switchesDefault.ClipNDArray.cs- 6 × 11-type switchesDefaultEngine.BinaryOp.cs/UnaryOp.cs/CompareOp.cs- Scalar dispatch chains
Success Metrics
| Metric | Before | Target |
|---|---|---|
| NPTypeCode switch cases | ~2,700 | <500 |
| Lines of type-dispatch code | ~5,700 | ~1,000 |
| Regen template files | ~20 | ~5 |
| SIMD coverage for casting | 0% | 80%+ |
Implementation Approach
// Before: 144 separate loop implementations
case NPTypeCode.Int32:
var src = (int*)source.Address;
switch (outType) {
case NPTypeCode.Double:
for (int i = 0; i < len; i++) dst[i] = (double)src[i];
break;
// ... 11 more
}
break;
// ... 11 more input types
// After: Single IL-generated kernel
var kernel = ILKernelGenerator.GetCastKernel(srcType, dstType);
kernel(srcPtr, dstPtr, count);Files to Skip
| File | Reason |
|---|---|
np.random.shuffle.cs |
Random access patterns defeat SIMD |
np.random.randint.cs |
RNG is bottleneck, not type dispatch |
MultiIterator.cs |
Iterator infrastructure, type dispatch acceptable |
NPTypeCode.cs |
Extension methods, not compute loops |
Converts.cs |
Low-level converters called from IL |
Related
- Generic Math Migration (
docs/GENERIC_MATH_DESIGN.md) - Full analysis:
docs/ISSUE_IL_MIGRATION.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
NumPy 2.x ComplianceAligns behavior with NumPy 2.x (NEPs, breaking changes)Aligns behavior with NumPy 2.x (NEPs, breaking changes)coreInternal engine: Shape, Storage, TensorEngine, iteratorsInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestNew feature or requestperformancePerformance improvements or optimizationsPerformance improvements or optimizations