Status: Running investigations to solve ablation study mysteries Date: December 16, 2024
- Individual features help, but combining them hurts!
- Vegas Spread alone: +1.2% (68.0% accuracy)
- Momentum alone: +0.8% (67.6% accuracy, 92.3% HC!)
- Temporal alone: +0.4% (67.2% accuracy)
- Expected if combined: ~+2.4%
- Actual combined (Full Week 10+): -1.6% (65.2% accuracy)
- Lost to interaction: -4.0 percentage points
1. Defensive Stats: WORST Individual Feature (-3.9%)
- Baseline: 66.8%
- With Defensive Stats: 62.9%
- Suspected Bug:
defensive_ppg = epa.sum() * 6 / weeks(Why × 6?) - Hypothesis: Should be
defensive_epa_pg = epa.sum() / weeks - Expected if fixed: -3.9% → +1 to +2%
2. Momentum: BEST HC Accuracy (92.3%)
- Baseline HC: 73.3%
- Momentum HC: 92.3% (AMAZING!)
- Overall: +0.8% improvement
- BUT: Week 10-14 production showed only 55.0% HC accuracy
- Mystery: Why the 37.3 point discrepancy?
3. The 12.1 Point Gap
- Ablation study (2024 holdout): 65.2%
- Week 10-14 production (2025): 53.1%
- Gap: 12.1 percentage points
- Possible causes:
- Different feature implementations
- Different data (2024 vs 2025)
- Different RFE feature selection
- Implementation bugs in production
Status: Notebook created Purpose: Compare Week10/Model.ipynb to ablation study code Hypotheses to test:
- Different momentum formula (wins vs fantasy points)
- Defensive stats bug in production too
- 2024 vs 2025 data differences
- RFE selected different features
- Temporal weighting not applied correctly
Status: Running initial test Purpose: Validate if × 6 bug is root cause Test:
- Compare buggy:
epa × 6 / weeks - Against fixed:
epa / weeks - Measure delta on 2024 holdout
Expected: If bug confirmed, -3.9% should flip to +1-2%
Status: Notebook created Purpose: Test if momentum generalizes beyond 2024 Test:
- Run momentum test on 2023 data
- Run on 2024 for comparison
- Check if +0.8% and 92.3% HC persist
Decision criteria:
- If helps on both years → Deploy ✅
- If hurts on 2023 → Don't deploy ❌
- If mixed → Test on 2022 too
⚠️
Status: Notebook created (basic structure) Purpose: Identify synergistic vs conflicting pairs Pairs to test:
- Momentum + Temporal (both weight recent data - conflict?)
- Momentum + Defensive (fixed)
- Momentum + Vegas
- Temporal + Defensive (fixed)
- Vegas + Defensive (fixed)
Interaction types:
- Synergistic: A+B > delta_A + delta_B ✅
- Additive: A+B ≈ delta_A + delta_B ➡️
- Conflicting: A+B < delta_A + delta_B ❌
1. Defensive Bug Fixed (+3-5 points)
- Current: -3.9% harmful
- Fixed: +1 to +2% helpful
- Total swing: +4.9 to +5.9 points
- Deploy: Yes, with fixed formula
2. Momentum Validated (+0.8 points)
- If generalizes to 2023 and 2024
- 92.3% HC accuracy maintained
- Deploy: Yes, add to Week 16
3. Safe Feature Pairs Identified
- Avoid conflicting combinations
- Use only synergistic/additive pairs
- Deploy: Optimal non-conflicting set
Combined Expected Improvement:
- Baseline: 66.8%
-
- Defensive (fixed): +1.5% → 68.3%
-
- Momentum: +0.8% → 69.1%
- Target: 68-70% accuracy for Week 16
1. Defensive Bug NOT Root Cause
- Other implementation issue
- Action: Skip defensive stats entirely
2. Momentum Doesn't Generalize
- Year-specific overfitting to 2024
- Action: Don't deploy, stick with baseline
3. All Pairs Conflict
- Features inherently incompatible
- Action: Deploy best single feature only
| Scenario | Defensive Fixed | Momentum Valid | Action |
|---|---|---|---|
| Best case | ✅ Yes | ✅ Yes | Deploy both (+2.3%) |
| Defensive only | ✅ Yes | ❌ No | Deploy defensive only (+1.5%) |
| Momentum only | ❌ No | ✅ Yes | Deploy momentum only (+0.8%) |
| Worst case | ❌ No | ❌ No | Stick with baseline (66.8%) |
Running Now: Loading NFL data (2015-2024) and creating investigation cache ETA: 2-3 minutes Next: Run defensive stats bug test Then: Validate momentum on 2023 Finally: Test feature pair interactions
Total Investigation Time: 4-6 hours for all tests
- ✅
investigation_1_gap_analysis.ipynb- Find 12.1 point gap - ✅
investigation_2_fix_defensive_bug.ipynb- Test defensive fix - ✅
investigation_3_validate_momentum_2023.ipynb- Cross-year validation - ✅
investigation_4_feature_pairs.ipynb- Interaction testing - ⏳
ablation_cache/- NFL data cache (creating now)
Next Update: After NFL data loaded and defensive bug test complete