Commit 0d80d09
committed
fix(training): harden model against NaN/Inf instability in fp16
- ssm.py: Clamp `delta` projection (max=5.0) to prevent SSM state explosion.
- trainer.py: Enhance gradient clipping to detect and skip `Inf` gradients (overflow), not just `NaN`.
- model.py: Add runtime fail-safe to patch layer outputs using `nan_to_num` if corruption is detected during forward pass.
This addresses the loss divergence observed at step 11k.1 parent f01c9a2 commit 0d80d09
3 files changed
Lines changed: 8 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
| 46 | + | |
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
78 | | - | |
| 77 | + | |
| 78 | + | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
76 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
77 | 79 | | |
78 | | - | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
0 commit comments