Investigation Plan Summary

Status: Running investigations to solve ablation study mysteries Date: December 16, 2024

What We Know from Ablation Study

The Paradox

Individual features help, but combining them hurts!
Vegas Spread alone: +1.2% (68.0% accuracy)
Momentum alone: +0.8% (67.6% accuracy, 92.3% HC!)
Temporal alone: +0.4% (67.2% accuracy)
Expected if combined: ~+2.4%
Actual combined (Full Week 10+): -1.6% (65.2% accuracy)
Lost to interaction: -4.0 percentage points

Critical Findings

1. Defensive Stats: WORST Individual Feature (-3.9%)

Baseline: 66.8%
With Defensive Stats: 62.9%
Suspected Bug: defensive_ppg = epa.sum() * 6 / weeks (Why × 6?)
Hypothesis: Should be defensive_epa_pg = epa.sum() / weeks
Expected if fixed: -3.9% → +1 to +2%

2. Momentum: BEST HC Accuracy (92.3%)

Baseline HC: 73.3%
Momentum HC: 92.3% (AMAZING!)
Overall: +0.8% improvement
BUT: Week 10-14 production showed only 55.0% HC accuracy
Mystery: Why the 37.3 point discrepancy?

3. The 12.1 Point Gap

Ablation study (2024 holdout): 65.2%
Week 10-14 production (2025): 53.1%
Gap: 12.1 percentage points
Possible causes:
- Different feature implementations
- Different data (2024 vs 2025)
- Different RFE feature selection
- Implementation bugs in production

Investigations Running

Investigation 1: Find the 12.1 Point Gap

Status: Notebook created Purpose: Compare Week10/Model.ipynb to ablation study code Hypotheses to test:

Different momentum formula (wins vs fantasy points)
Defensive stats bug in production too
2024 vs 2025 data differences
RFE selected different features
Temporal weighting not applied correctly

Investigation 2: Fix Defensive Stats Bug

Status: Running initial test Purpose: Validate if × 6 bug is root cause Test:

Compare buggy: epa × 6 / weeks
Against fixed: epa / weeks
Measure delta on 2024 holdout

Expected: If bug confirmed, -3.9% should flip to +1-2%

Investigation 3: Validate Momentum on 2023

Status: Notebook created Purpose: Test if momentum generalizes beyond 2024 Test:

Run momentum test on 2023 data
Run on 2024 for comparison
Check if +0.8% and 92.3% HC persist

Decision criteria:

If helps on both years → Deploy ✅
If hurts on 2023 → Don't deploy ❌
If mixed → Test on 2022 too ⚠️

Investigation 4: Feature Pair Interactions

Status: Notebook created (basic structure) Purpose: Identify synergistic vs conflicting pairs Pairs to test:

Momentum + Temporal (both weight recent data - conflict?)
Momentum + Defensive (fixed)
Momentum + Vegas
Temporal + Defensive (fixed)
Vegas + Defensive (fixed)

Interaction types:

Synergistic: A+B > delta_A + delta_B ✅
Additive: A+B ≈ delta_A + delta_B ➡️
Conflicting: A+B < delta_A + delta_B ❌

Expected Outcomes

If Investigations Confirm:

1. Defensive Bug Fixed (+3-5 points)

Current: -3.9% harmful
Fixed: +1 to +2% helpful
Total swing: +4.9 to +5.9 points
Deploy: Yes, with fixed formula

2. Momentum Validated (+0.8 points)

If generalizes to 2023 and 2024
92.3% HC accuracy maintained
Deploy: Yes, add to Week 16

3. Safe Feature Pairs Identified

Avoid conflicting combinations
Use only synergistic/additive pairs
Deploy: Optimal non-conflicting set

Combined Expected Improvement:

Baseline: 66.8%
- Defensive (fixed): +1.5% → 68.3%
- Momentum: +0.8% → 69.1%
Target: 68-70% accuracy for Week 16

If Investigations Reveal:

1. Defensive Bug NOT Root Cause

Other implementation issue
Action: Skip defensive stats entirely

2. Momentum Doesn't Generalize

Year-specific overfitting to 2024
Action: Don't deploy, stick with baseline

3. All Pairs Conflict

Features inherently incompatible
Action: Deploy best single feature only

Quick Decision Matrix for Week 16

Scenario	Defensive Fixed	Momentum Valid	Action
Best case	✅ Yes	✅ Yes	Deploy both (+2.3%)
Defensive only	✅ Yes	❌ No	Deploy defensive only (+1.5%)
Momentum only	❌ No	✅ Yes	Deploy momentum only (+0.8%)
Worst case	❌ No	❌ No	Stick with baseline (66.8%)

Current Status

Running Now: Loading NFL data (2015-2024) and creating investigation cache ETA: 2-3 minutes Next: Run defensive stats bug test Then: Validate momentum on 2023 Finally: Test feature pair interactions

Total Investigation Time: 4-6 hours for all tests

Files Created

✅ investigation_1_gap_analysis.ipynb - Find 12.1 point gap
✅ investigation_2_fix_defensive_bug.ipynb - Test defensive fix
✅ investigation_3_validate_momentum_2023.ipynb - Cross-year validation
✅ investigation_4_feature_pairs.ipynb - Interaction testing
⏳ ablation_cache/ - NFL data cache (creating now)

Next Update: After NFL data loaded and defensive bug test complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation Plan Summary

What We Know from Ablation Study

The Paradox

Critical Findings

Investigations Running

Investigation 1: Find the 12.1 Point Gap

Investigation 2: Fix Defensive Stats Bug

Investigation 3: Validate Momentum on 2023

Investigation 4: Feature Pair Interactions

Expected Outcomes

If Investigations Confirm:

If Investigations Reveal:

Quick Decision Matrix for Week 16

Current Status

Files Created

FilesExpand file tree

INVESTIGATION_PLAN_SUMMARY.md

Latest commit

History

INVESTIGATION_PLAN_SUMMARY.md

File metadata and controls

Investigation Plan Summary

What We Know from Ablation Study

The Paradox

Critical Findings

Investigations Running

Investigation 1: Find the 12.1 Point Gap

Investigation 2: Fix Defensive Stats Bug

Investigation 3: Validate Momentum on 2023

Investigation 4: Feature Pair Interactions

Expected Outcomes

If Investigations Confirm:

If Investigations Reveal:

Quick Decision Matrix for Week 16

Current Status

Files Created