Skip to content

[Refactor] IL Kernel/Generation Migration: Eliminate NPTypeCode Switch/Case Patterns #587

@Nucs

Description

@Nucs

Summary

NumSharp contains approximately ~2,700 NPTypeCode switch/case occurrences across 66 files, resulting in ~5,700 lines of repetitive type-dispatched code. This issue tracks the migration of these patterns to IL-generated kernels, reducing code size, improving maintainability, and enabling SIMD optimization.

Problem Statement

The current codebase uses extensive switch (typecode) { case NPTypeCode.X: ... } patterns to handle NumSharp's 12 supported types:

Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal

This results in:

  • Code bloat: 12 nearly-identical branches per operation
  • Maintenance burden: Changes must be replicated across all type branches
  • Regen dependency: Many files use #if _REGEN template generation
  • Missed SIMD opportunities: Scalar loops where vectorization is possible

High-Impact Files

File NPTypeCode Cases Category
Utilities/Converts.cs 516 Type Conversion
UnmanagedMemoryBlock.Casting.cs 342 Type Casting
Utilities/ArrayConvert.cs 221 Array Conversion
Backends/NPTypeCode.cs 161 Extension Methods
Unmanaged/ArraySlice.cs 130 Slice Operations
DefaultEngine.ReductionOp.cs 69 Reductions
Default.ClipNDArray.cs 66 Clip with NDArray
UnmanagedStorage.cs 52 Storage Operations

Migration Priority

P0: Type Casting (Est. 4000 LOC reduction)

  • UnmanagedMemoryBlock.Casting.cs - 12×12 nested switch, 291 for-loops
  • ArrayConvert.cs - 12×12 nested switch, 172 for-loops
  • Target: Single IL kernel per type-pair, SIMD widening/narrowing

P1: Indexing Operations (Est. 600 LOC reduction)

  • NDArray.Indexing.Selection.Getter.cs - 12-type dispatch
  • NDArray.Indexing.Selection.Setter.cs - 12-type dispatch
  • Target: IL gather/scatter kernels

P2: Math Operations (Est. 400 LOC reduction)

  • np.linspace.cs - 12 per-type loops → IL sequence generation with SIMD
  • np.repeat.cs - 12 per-type loops → IL fill kernel with SIMD
  • np.all.cs / np.any.cs axis path → IL axis reduction with early-exit

P3: Reduction Fallbacks (Est. 200 LOC reduction)

  • Default.Reduction.CumAdd.cs - 10-type fallback switch
  • Default.Reduction.CumMul.cs - 10-type fallback switch

P4: Dispatch Cleanup (Est. 500 LOC reduction)

Files that already have IL kernels but retain verbose type dispatch:

  • Default.Clip.cs - 3 × 11-type switches
  • Default.ClipNDArray.cs - 6 × 11-type switches
  • DefaultEngine.BinaryOp.cs / UnaryOp.cs / CompareOp.cs - Scalar dispatch chains

Success Metrics

Metric Before Target
NPTypeCode switch cases ~2,700 <500
Lines of type-dispatch code ~5,700 ~1,000
Regen template files ~20 ~5
SIMD coverage for casting 0% 80%+

Implementation Approach

// Before: 144 separate loop implementations
case NPTypeCode.Int32:
    var src = (int*)source.Address;
    switch (outType) {
        case NPTypeCode.Double:
            for (int i = 0; i < len; i++) dst[i] = (double)src[i];
            break;
        // ... 11 more
    }
    break;
// ... 11 more input types

// After: Single IL-generated kernel
var kernel = ILKernelGenerator.GetCastKernel(srcType, dstType);
kernel(srcPtr, dstPtr, count);

Files to Skip

File Reason
np.random.shuffle.cs Random access patterns defeat SIMD
np.random.randint.cs RNG is bottleneck, not type dispatch
MultiIterator.cs Iterator infrastructure, type dispatch acceptable
NPTypeCode.cs Extension methods, not compute loops
Converts.cs Low-level converters called from IL

Related

  • Generic Math Migration (docs/GENERIC_MATH_DESIGN.md)
  • Full analysis: docs/ISSUE_IL_MIGRATION.md

Metadata

Metadata

Assignees

Labels

NumPy 2.x ComplianceAligns behavior with NumPy 2.x (NEPs, breaking changes)coreInternal engine: Shape, Storage, TensorEngine, iteratorsenhancementNew feature or requestperformancePerformance improvements or optimizations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions