This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is an NFL game prediction system that uses machine learning to predict game winners and point spreads for the 2025 NFL season. The project is organized by week, with each week containing its own Model.ipynb notebook for generating predictions and tracking results.
NFL Performance Prediction/
├── Week1/ through Week14/ # Weekly prediction notebooks
│ ├── Model.ipynb # Prediction model for that week
│ ├── week*_predictions.csv # Saved predictions (if generated)
│ ├── final_model.joblib # Trained model checkpoint
│ └── nfl_data/ # Downloaded NFL data cache
├── Plot.ipynb # Performance analysis and visualization across all weeks
├── .claude/agents/ # Custom Claude Code agents for model analysis and optimization
└── Week14/ # Latest week with additional documentation
├── Project_Summary_Report.md # Academic summary report
└── README_PDF_Conversion.md # Instructions for PDF conversion
- Navigate to the desired week's directory (e.g.,
Week6/) - Open
Model.ipynbin Jupyter - Execute cells in order:
- Cell 1: Install dependencies (
xgboost,nfl_data_py,pillow) - Cell 2: Initialize NFLGamePredictor class (includes data collection and model training)
- Cell 3: Train the model (this downloads NFL data from 2015-2025 and may take several minutes)
- Cell 4-6: Run predictions for the specific week's games
- Cell 7: Fetch actual results and calculate accuracy (after games have been played)
- Cell 8: Save predictions to CSV
- Cell 1: Install dependencies (
Use Plot.ipynb in the root directory to:
- Aggregate predictions from all weeks
- Calculate cumulative accuracy statistics
- Generate performance visualizations
- Track high-confidence pick success rates
Important: Plot.ipynb automatically loads predictions from:
- Global variables if Model.ipynb was run in the same kernel (e.g.,
week1_results,week6_spread_results) - CSV files saved in each Week directory (e.g.,
Week1/week1_predictions.csv)
Week 1-5: Basic winner prediction
- Uses
NFLGamePredictorclass - Predictions stored in
week*_resultsvariables - Outputs: predicted winner and confidence percentage
Week 6+: Enhanced spread prediction
- Uses both
NFLGamePredictorandNFLSpreadPredictorclasses - Predictions stored in
week*_spread_resultsvariables - Outputs: predicted winner, confidence, and point spread
Core Features (selected via RFE):
- Offensive:
home_passing_ypg,home_rushing_ypg,home_total_ypg,home_points_pg,home_passing_tds_pg - Defensive:
home_defensive_ypg,home_defensive_ppg(yards/points allowed) - Turnovers:
home_turnovers_pg,away_turnovers_pg,turnover_advantage - Contextual:
scoring_advantage,is_playoff,season
Enhanced Features (Week 10+):
- Injury Metrics:
home_injury_pct,away_injury_pct,injury_advantage(estimated from performance variance) - Momentum:
momentum_last3(win % over last 3 games),momentum_advantage - Rest Days: Days since last game (captures fatigue, especially for Thursday games)
- Division Rivalry: Boolean flag for divisional matchups
- Vegas Spread: Market consensus (when available)
Week 1-9 (Legacy): 3-model ensemble
- Random Forest (n_estimators=200, max_depth=5)
- Logistic Regression (C=1)
- XGBoost (max_depth=5, learning_rate=0.1)
Week 10+ (Enhanced): 4-model ensemble with temporal weighting
- Random Forest (n_estimators=200, max_depth=15, with overfitting protection)
- Logistic Regression (C=1, max_iter=1000)
- Gradient Boosting Classifier (n_estimators=200, max_depth=8, learning_rate=0.1)
- XGBoost (max_depth=8, L1/L2 regularization, subsampling=0.8)
Temporal Weighting: Recent seasons weighted via exponential decay: weight = exp(-0.15 × years_ago). 2024 data weighted ~3× more than 2015 data.
All models wrapped in CalibratedClassifierCV (isotonic regression, cv=3) for better probability estimates.
All data is automatically fetched from nfl_data_py library:
- Play-by-play data: Game-level statistics (2015-2025)
- Weekly data: Player and team performance metrics
- Schedule data: Game schedules, scores, spreads
Data is cached locally in nfl_data/ folders within each week directory to speed up subsequent runs.
collect_data(start_year, end_year): Download NFL data and cache as CSVcreate_team_features(weekly_data, season, week): Calculate season-to-date team statistics (includes momentum calculation for Week 10+)create_game_features(home_team, away_team, ...): Generate matchup features (includes rest_days, division_game, vegas_spread for Week 10+)build_dataset(pbp_data, weekly_data, schedule_data): Process raw data into training datasetselect_features(df, n_features): Use RFE to identify optimal feature setcreate_ensemble_model(df): Train final voting classifier with temporal weighting (Week 10+)evaluate_model_with_calibration(df): Time-series cross-validation with Brier score and log loss (Week 10+)predict_games(games_df): Generate predictions for new games_calculate_injury_percentage(): Estimate injury impact from performance variance (Week 6+)_calculate_defensive_stats(): Calculate defensive metrics (Week 10+)
train_spread_model(df): Train regression model for point spreads (tests 4 models: Random Forest, Linear Regression, Gradient Boosting with quantile loss, XGBoost with regularization)predict_spreads(games_df): Predict point differential for games with confidence calculation
predict_multiple_games(predictor, games_text, season, week): Batch predict from formatted textpredict_multiple_games_with_spreads(predictor, spread_predictor, ...): Enhanced version with spreadsfetch_actual_results(predictions_df, season, week): Auto-fetch game results from nfl_data_pyanalyze_week(week_num, season_year, predictions_df, actuals): Compare predictions vs actual results
- Weeks 1-5:
week{N}_results(e.g.,week1_results,week2_results) - Weeks 6+:
week{N}_spread_results(e.g.,week6_spread_results,week9_spread_results) - Actual results:
week{N}_actual_results,week{N}_final_results
Standard NFL abbreviations are used throughout:
PHI(Eagles),KC(Chiefs),LAC(Chargers),SF(49ers),LA(Rams)- Note: Rams use
LAnotLAR; use theteam_mappingdict in prediction functions for consistency
- Overall accuracy: 60.7% across 135 games
- Total correct: 82 games, Total incorrect: 53 games
- High-confidence picks (>65%): Accuracy varies significantly by week (33%-100%)
- Week-to-week variance: ±23 percentage points
- Best weeks: Week 3 (87.5%), Week 8 (84.6%)
- Worst week: Week 6 (40.0%)
Week-by-Week Performance:
- Week 1: 62.5% (10/16), High confidence: 80.0%
- Week 2: 56.2% (9/16), High confidence: 75.0%
- Week 3: 87.5% (14/16), High confidence: 75.0%
- Week 4: 56.2% (9/16), High confidence: 33.3%
- Week 5: 42.9% (6/14), High confidence: 66.7%
- Week 6: 40.0% (6/15), High confidence: 50.0%
- Week 7: 60.0% (9/15), High confidence: 100.0%
- Week 8: 84.6% (11/13), High confidence: 83.3%
- Week 9: 57.1% (8/14), High confidence: 50.0%
Legacy Spread Model Performance (Weeks 6-9):
- MAE: ~10.76 points
- RMSE: ~14.21 points
- Overall accuracy target: 66-68% (+5-7 points improvement)
- High-confidence picks target: 72-75% (+9-12 points improvement)
- Week-to-week variance target: ±12 points (50% reduction)
- Spread MAE target: 7.5-8.5 points (20-30% improvement)
- Spread RMSE target: <10.5 points (26% improvement)
Note: Week 10+ uses enhanced 4-model ensemble with temporal weighting, momentum features, defensive metrics, and improved spread model.
Performance tracking: For comprehensive performance analysis across all weeks, see Week14/Project_Summary_Report.md which includes detailed metrics, feature importance analysis, and insights from the 2024 season.
- Create
Week{N}/directory - Copy
Model.ipynbfrom Week 10-14 to get enhanced model with all improvements- Week 10-14 use the enhanced 4-model ensemble
- For legacy model reference (Weeks 1-9): See Week 6-9 for basic spread predictions
- Latest tested model: Week 14
- Update the week number in:
- Game schedule text (update team matchups)
WEEK_NUMBERandSEASONconfiguration variables- Variable names (e.g.,
week15_games,week15_spread_results)
- Update game schedule dictionary with correct dates/times
- Run all cells to generate predictions:
- Cell 0: Install dependencies (run once)
- Cells 1-4: Train models (5-15 min first run, 3-5 min cached)
- Cell 6: Generate predictions
- Cell 7: Fetch results (after games complete)
- Cell 8: Export CSV (saves to
week{N}_predictions.csv)
- After games complete, run result analysis cell
- Update
MAX_WEEKin Plot.ipynb and re-run to include new week
Note: Week 14 includes academic documentation and may not have all standard outputs if it was used for report generation rather than live predictions.
If a week's Model.ipynb has been executed but the CSV wasn't saved:
- Read the notebook file to find the prediction outputs
- Look for
week{N}_spread_resultsorweek{N}_resultsin cell outputs - Extract the prediction data (matchup, predicted_winner, confidence, spreads)
- Create CSV with required columns:
game_num,away_team,home_team,matchup,predicted_winner,confidence,home_win_prob,away_win_prob - For spread predictions (Week 6+), also include:
predicted_spread,spread_display,favored_team,spread_magnitude
For future improvements:
- Adjust feature selection in
select_features()(changen_featuresparameter) - Tune hyperparameters in
create_ensemble_model()(e.g., adjust max_depth, learning_rate) - Add new features in
create_game_features()(e.g., weather conditions, quarterback ratings) - Modify temporal weighting decay rate (currently
-0.15, higher = more recent bias) - For spread model: adjust regression models in
train_spread_model()
Testing model changes:
- Use
evaluate_model_with_calibration(df)for time-series cross-validation (Week 10+) - Monitor Brier score (calibration quality) and log loss (probability accuracy)
- Compare predictions on holdout weeks before deploying
- Document expected vs actual performance improvements
If data download fails:
- Check internet connection
nfl_data_pymay not have current week's data yet- Model falls back to previous season data (e.g., uses 2024 Week 19 for 2025 predictions)
- Verify team abbreviations match NFL standard codes
Plot.ipynb automatically creates:
- Weekly accuracy line chart with 50% baseline and fill areas
- Cumulative accuracy progression tracking
- Correct vs incorrect predictions bar chart (grouped by week)
- High-confidence pick performance (>65% confidence threshold)
- Detailed game-by-game results tables with confidence scores
Important: Update MAX_WEEK variable in Plot.ipynb to match the latest completed week with predictions CSV files.
Running Plot.ipynb:
- Ensure all
week{N}_predictions.csvfiles exist in their respective Week directories - Open Plot.ipynb in Jupyter
- Verify
MAX_WEEKmatches your latest week with saved predictions (e.g.,MAX_WEEK = 13) - Execute all cells to generate visualizations
- The notebook will:
- Auto-detect prediction files from Week1 through Week{MAX_WEEK}
- Fetch actual results via
nfl_data_py - Calculate statistics and generate 4 comprehensive plots
- Display detailed breakdown tables
Required packages:
pandas>=1.5.3
numpy>=1.26.0
matplotlib
seaborn
scikit-learn
xgboost>=3.0.2
nfl_data_py>=0.3.3
joblib
pillow # For image handling in notebooks
Install with: pip install xgboost nfl_data_py pillow
- Training: Model trains on historical games (2015-2024) where outcomes are known
- Feature Generation: For Week N predictions, uses team stats through Week N-1
- Prediction: Generates winner/spread predictions for Week N games
- Validation: After games complete, fetches actual results and calculates accuracy
- Model retrains from scratch each week using historical data (2015-2024) plus current season
- Temporal weighting (Week 10+): Recent seasons weighted exponentially higher via
exp(-0.15 × years_ago) - Predictions use team statistics up to (but not including) the target week
- Momentum features (Week 10+): Includes last 3 games win percentage
- Home field advantage is fixed at 2.5 points in feature engineering
- Injury metrics are estimated from performance variance (not actual injury reports)
- Spread predictions:
- Week 6-9: Random Forest regression
- Week 10+: Tests 4 models (Random Forest, Linear, Gradient Boosting with quantile loss, XGBoost), selects best by MAE
- Confidence scores for spread model:
0.50 + min(0.45, abs(spread) × 0.025), capped at 95%
Each week{N}_predictions.csv must contain these columns for Plot.ipynb to work:
matchup(format: "AWAY @ HOME", e.g., "KC @ LAC")away_team(3-letter abbreviation)home_team(3-letter abbreviation)predicted_winner(3-letter abbreviation)confidence(float, 0.0 to 1.0)
Optional columns for enhanced analysis:
predicted_spread,spread_display,favored_team,spread_magnitude
fetch_actual_results()function automatically queriesnfl_data_pyfor completed games- Returns None if games haven't been played yet (futures games have no scores)
- Matches predictions to actual results by
away_teamandhome_teamfields - Compares predicted winner vs actual winner to calculate accuracy
Major model enhancements implemented based on comprehensive analysis. All improvements are documented in MODEL_IMPROVEMENTS_SUMMARY.md and QUICK_START_IMPROVED_MODEL.md.
Key Changes:
- Enhanced Ensemble: 3→4 models, increased tree depth (5→15 for RF, 5→8 for XGB/GB)
- Temporal Weighting: Exponential decay favoring recent seasons (2024 weighted 3× vs 2015)
- New Features: Momentum (last 3 games), defensive stats, rest days, division rivalry, vegas spread
- Better Cross-Validation: TimeSeriesSplit with Brier score and log loss metrics
- Improved Spread Model: Tests 4 regression models, quantile loss for Gradient Boosting
Expected Impact:
- Accuracy: 60.7% → 66-68%
- Variance reduction: 50% (±23 → ±12 points week-to-week)
- Spread MAE: 10.76 → 7.5-8.5 points
Using Enhanced Model:
- Copy
Week10/Model.ipynbfor all future weeks (Week 11+) - All improvements are backward compatible
- Enhanced features auto-populate when available in data
- Original model preserved in
Week10/Model_backup_20251106.ipynb
Validation:
- 18/18 automated tests passed (100%)
- Production-ready, no syntax errors or breaking changes
- Maintains full compatibility with Plot.ipynb
Week14/Project_Summary_Report.md contains a comprehensive 4-page academic summary (2,800 words) including:
- Executive Summary: Overview of the ML prediction system achieving 63% accuracy on 2024 season
- Problem Statement & Motivation: Sports prediction challenges and ML relevance
- Detailed Methodology: Data collection (nfl_data_py), feature engineering (32 features), model architecture (4-model ensemble)
- Results & Performance: Comprehensive metrics including accuracy by confidence level, feature importance rankings, model comparisons
- Insights & Analysis: What drives NFL success (offense vs defense, momentum effects, home field advantage)
- Practical Applications: Guidelines for sports betting, fantasy football, and team analytics
- Limitations & Future Work: Data quality issues, model constraints, improvement roadmap
Week14/README_PDF_Conversion.md provides multiple methods for converting the summary report to PDF:
- Pandoc (recommended): Command-line conversion with LaTeX styling
- VS Code/Cursor: Markdown preview → print to PDF
- Typora/MacDown: Desktop markdown editors with export
- Online converters: Browser-based options
- Python automation: markdown-pdf package
This project includes specialized agents in .claude/agents/ for model analysis and optimization:
Purpose: Analyzes ML model performance, reviews architecture, and identifies optimization opportunities.
When to use:
- After model training completes
- When accuracy drops below expected thresholds
- When investigating performance inconsistencies
- When user requests model improvements
Capabilities:
- Comprehensive model architecture analysis (ensemble composition, hyperparameters, calibration)
- Performance metrics deep dive (accuracy trends, spread MAE/RMSE, confidence calibration)
- Feature engineering assessment (RFE selection, missing features, importance scores)
- Root cause identification for performance issues
- Prioritized recommendations with expected impact estimates
Output: Detailed analysis report with executive summary, performance diagnosis, and prioritized recommendations for model-optimizer agent.
Purpose: Implements model improvements and optimizations systematically.
When to use:
- After receiving recommendations from model-analyzer
- When implementing specific model enhancements (hyperparameter changes, new features)
- When applying architectural changes to existing models
- When optimizing model training configurations
Capabilities:
- Incremental implementation of model improvements
- Hyperparameter tuning and validation
- Feature engineering additions with testing
- Model architecture modifications
- Performance comparison and documentation
Workflow: Typically invoked after model-analyzer identifies improvement opportunities. Ensures proper implementation, testing, and backward compatibility.
These agents are automatically available when using Claude Code in this repository and should be invoked proactively when working with model performance issues.