[Refactor] IL Kernel/Generation Migration: Eliminate NPTypeCode Switch/Case Patterns

## Summary

NumSharp contains approximately **~2,700 NPTypeCode switch/case occurrences across 66 files**, resulting in ~5,700 lines of repetitive type-dispatched code. This issue tracks the migration of these patterns to IL-generated kernels, reducing code size, improving maintainability, and enabling SIMD optimization.

## Problem Statement

The current codebase uses extensive `switch (typecode) { case NPTypeCode.X: ... }` patterns to handle NumSharp's 12 supported types:

```
Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal
```

This results in:
- **Code bloat**: 12 nearly-identical branches per operation
- **Maintenance burden**: Changes must be replicated across all type branches
- **Regen dependency**: Many files use `#if _REGEN` template generation
- **Missed SIMD opportunities**: Scalar loops where vectorization is possible

## High-Impact Files

| File | NPTypeCode Cases | Category |
|------|------------------|----------|
| `Utilities/Converts.cs` | 516 | Type Conversion |
| `UnmanagedMemoryBlock.Casting.cs` | 342 | Type Casting |
| `Utilities/ArrayConvert.cs` | 221 | Array Conversion |
| `Backends/NPTypeCode.cs` | 161 | Extension Methods |
| `Unmanaged/ArraySlice.cs` | 130 | Slice Operations |
| `DefaultEngine.ReductionOp.cs` | 69 | Reductions |
| `Default.ClipNDArray.cs` | 66 | Clip with NDArray |
| `UnmanagedStorage.cs` | 52 | Storage Operations |

## Migration Priority

### P0: Type Casting (Est. 4000 LOC reduction)
- `UnmanagedMemoryBlock.Casting.cs` - 12×12 nested switch, 291 for-loops
- `ArrayConvert.cs` - 12×12 nested switch, 172 for-loops
- **Target**: Single IL kernel per type-pair, SIMD widening/narrowing

### P1: Indexing Operations (Est. 600 LOC reduction)
- `NDArray.Indexing.Selection.Getter.cs` - 12-type dispatch
- `NDArray.Indexing.Selection.Setter.cs` - 12-type dispatch
- **Target**: IL gather/scatter kernels

### P2: Math Operations (Est. 400 LOC reduction)
- `np.linspace.cs` - 12 per-type loops → IL sequence generation with SIMD
- `np.repeat.cs` - 12 per-type loops → IL fill kernel with SIMD
- `np.all.cs` / `np.any.cs` axis path → IL axis reduction with early-exit

### P3: Reduction Fallbacks (Est. 200 LOC reduction)
- `Default.Reduction.CumAdd.cs` - 10-type fallback switch
- `Default.Reduction.CumMul.cs` - 10-type fallback switch

### P4: Dispatch Cleanup (Est. 500 LOC reduction)
Files that already have IL kernels but retain verbose type dispatch:
- `Default.Clip.cs` - 3 × 11-type switches
- `Default.ClipNDArray.cs` - 6 × 11-type switches
- `DefaultEngine.BinaryOp.cs` / `UnaryOp.cs` / `CompareOp.cs` - Scalar dispatch chains

## Success Metrics

| Metric | Before | Target |
|--------|--------|--------|
| NPTypeCode switch cases | ~2,700 | <500 |
| Lines of type-dispatch code | ~5,700 | ~1,000 |
| Regen template files | ~20 | ~5 |
| SIMD coverage for casting | 0% | 80%+ |

## Implementation Approach

```csharp
// Before: 144 separate loop implementations
case NPTypeCode.Int32:
    var src = (int*)source.Address;
    switch (outType) {
        case NPTypeCode.Double:
            for (int i = 0; i < len; i++) dst[i] = (double)src[i];
            break;
        // ... 11 more
    }
    break;
// ... 11 more input types

// After: Single IL-generated kernel
var kernel = ILKernelGenerator.GetCastKernel(srcType, dstType);
kernel(srcPtr, dstPtr, count);
```

## Files to Skip

| File | Reason |
|------|--------|
| `np.random.shuffle.cs` | Random access patterns defeat SIMD |
| `np.random.randint.cs` | RNG is bottleneck, not type dispatch |
| `MultiIterator.cs` | Iterator infrastructure, type dispatch acceptable |
| `NPTypeCode.cs` | Extension methods, not compute loops |
| `Converts.cs` | Low-level converters called from IL |

## Related

- Generic Math Migration (`docs/GENERIC_MATH_DESIGN.md`)
- Full analysis: `docs/ISSUE_IL_MIGRATION.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] IL Kernel/Generation Migration: Eliminate NPTypeCode Switch/Case Patterns #587

Summary

Problem Statement

High-Impact Files

Migration Priority

P0: Type Casting (Est. 4000 LOC reduction)

P1: Indexing Operations (Est. 600 LOC reduction)

P2: Math Operations (Est. 400 LOC reduction)

P3: Reduction Fallbacks (Est. 200 LOC reduction)

P4: Dispatch Cleanup (Est. 500 LOC reduction)

Success Metrics

Implementation Approach

Files to Skip

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	NPTypeCode Cases	Category
`Utilities/Converts.cs`	516	Type Conversion
`UnmanagedMemoryBlock.Casting.cs`	342	Type Casting
`Utilities/ArrayConvert.cs`	221	Array Conversion
`Backends/NPTypeCode.cs`	161	Extension Methods
`Unmanaged/ArraySlice.cs`	130	Slice Operations
`DefaultEngine.ReductionOp.cs`	69	Reductions
`Default.ClipNDArray.cs`	66	Clip with NDArray
`UnmanagedStorage.cs`	52	Storage Operations

Metric	Before	Target
NPTypeCode switch cases	~2,700	<500
Lines of type-dispatch code	~5,700	~1,000
Regen template files	~20	~5
SIMD coverage for casting	0%	80%+

File	Reason
`np.random.shuffle.cs`	Random access patterns defeat SIMD
`np.random.randint.cs`	RNG is bottleneck, not type dispatch
`MultiIterator.cs`	Iterator infrastructure, type dispatch acceptable
`NPTypeCode.cs`	Extension methods, not compute loops
`Converts.cs`	Low-level converters called from IL

Uh oh!

[Refactor] IL Kernel/Generation Migration: Eliminate NPTypeCode Switch/Case Patterns #587

Description

Summary

Problem Statement

High-Impact Files

Migration Priority

P0: Type Casting (Est. 4000 LOC reduction)

P1: Indexing Operations (Est. 600 LOC reduction)

P2: Math Operations (Est. 400 LOC reduction)

P3: Reduction Fallbacks (Est. 200 LOC reduction)

P4: Dispatch Cleanup (Est. 500 LOC reduction)

Success Metrics

Implementation Approach

Files to Skip

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions