-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't workinguntriagedNew issue has not been triagedNew issue has not been triaged
Description
SdcaLogisticRegression test fails on macOS ARM64 Release — LogLoss exceeds 0.5 threshold
System Information:
- OS: macOS 15 (Sequoia) ARM64
- Configuration: Release
- Helix Queue:
osx.15.arm64.open - .NET: net8.0
Describe the bug
The SdcaLogisticRegression test fails on macOS ARM64 Release builds with:
Assert.InRange() Failure: Value not in range
Range: (0 - 0.5)
Actual: 0.50113968629658268
The test at test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs:86 asserts metrics.LogLoss is in range (0, 0.5). On macOS ARM64 Release, the LogLoss is 0.5011 — exceeding the upper bound.
Key observations:
- Passes on Windows x64 (Debug and Release), Linux x64, and macOS ARM64 Debug
- Fails only on macOS ARM64 Release
- The difference (
0.0011above threshold) represents a 0.22% deviation, but LogLoss > 0.5 is semantically meaningful — it indicates the model's predicted probabilities are worse than a naive baseline - This was previously hidden because the macOS Helix queues (
OSX.13.Arm64.Open) were decommissioned, so macOS CI wasn't running at all. Updating toosx.15.arm64.open(PR Update macOS Helix queues from decommissioned OSX.13 to osx.15 #7599) exposed this
Root cause analysis:
The SDCA solver is iterative and uses floating-point arithmetic that is sensitive to:
- ARM64 Release JIT optimizations (FMA instruction fusion, instruction reordering)
- Low
l2Regularization: 0.001fmaking the optimizer more sensitive to numerical drift - Small dataset (100 samples) amplifying per-iteration rounding differences
The test uses MLContext(seed: 1) for determinism, but JIT optimization differences between Debug/Release on ARM64 cause the optimizer to converge to a slightly different (worse) solution.
Possible fixes:
- Investigate ARM64 numerical stability in the SDCA implementation — determine if FMA or other optimizations cause meaningful quality degradation
- Relax the test bound from
0.5to e.g.0.55— acknowledging cross-platform variance for this tiny dataset while still validating the model trains to a reasonable state - Increase training data — 100 samples is very small and amplifies numerical sensitivity
- Add
[Trait]to skip on ARM64 Release if this is considered acceptable platform variance
To Reproduce:
Run on macOS ARM64 in Release configuration:
dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj --filter "FullyQualifiedName~SdcaLogisticRegression" -c Release
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinguntriagedNew issue has not been triagedNew issue has not been triaged