CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is an NFL game prediction system that uses machine learning to predict game winners and point spreads for the 2025 NFL season. The project is organized by week, with each week containing its own Model.ipynb notebook for generating predictions and tracking results.

Project Structure

NFL Performance Prediction/
├── Week1/ through Week14/    # Weekly prediction notebooks
│   ├── Model.ipynb           # Prediction model for that week
│   ├── week*_predictions.csv # Saved predictions (if generated)
│   ├── final_model.joblib    # Trained model checkpoint
│   └── nfl_data/            # Downloaded NFL data cache
├── Plot.ipynb                # Performance analysis and visualization across all weeks
├── .claude/agents/           # Custom Claude Code agents for model analysis and optimization
└── Week14/                   # Latest week with additional documentation
    ├── Project_Summary_Report.md      # Academic summary report
    └── README_PDF_Conversion.md       # Instructions for PDF conversion

Development Workflow

Running Predictions for a Specific Week

Navigate to the desired week's directory (e.g., Week6/)
Open Model.ipynb in Jupyter
Execute cells in order:
- Cell 1: Install dependencies (xgboost, nfl_data_py, pillow)
- Cell 2: Initialize NFLGamePredictor class (includes data collection and model training)
- Cell 3: Train the model (this downloads NFL data from 2015-2025 and may take several minutes)
- Cell 4-6: Run predictions for the specific week's games
- Cell 7: Fetch actual results and calculate accuracy (after games have been played)
- Cell 8: Save predictions to CSV

Analyzing Overall Performance

Use Plot.ipynb in the root directory to:

Aggregate predictions from all weeks
Calculate cumulative accuracy statistics
Generate performance visualizations
Track high-confidence pick success rates

Important: Plot.ipynb automatically loads predictions from:

Global variables if Model.ipynb was run in the same kernel (e.g., week1_results, week6_spread_results)
CSV files saved in each Week directory (e.g., Week1/week1_predictions.csv)

Model Architecture

Two-Stage Prediction System

Week 1-5: Basic winner prediction

Uses NFLGamePredictor class
Predictions stored in week*_results variables
Outputs: predicted winner and confidence percentage

Week 6+: Enhanced spread prediction

Uses both NFLGamePredictor and NFLSpreadPredictor classes
Predictions stored in week*_spread_results variables
Outputs: predicted winner, confidence, and point spread

Feature Engineering

Core Features (selected via RFE):

Offensive: home_passing_ypg, home_rushing_ypg, home_total_ypg, home_points_pg, home_passing_tds_pg
Defensive: home_defensive_ypg, home_defensive_ppg (yards/points allowed)
Turnovers: home_turnovers_pg, away_turnovers_pg, turnover_advantage
Contextual: scoring_advantage, is_playoff, season

Enhanced Features (Week 10+):

Injury Metrics: home_injury_pct, away_injury_pct, injury_advantage (estimated from performance variance)
Momentum: momentum_last3 (win % over last 3 games), momentum_advantage
Rest Days: Days since last game (captures fatigue, especially for Thursday games)
Division Rivalry: Boolean flag for divisional matchups
Vegas Spread: Market consensus (when available)

Ensemble Model

Week 1-9 (Legacy): 3-model ensemble

Random Forest (n_estimators=200, max_depth=5)
Logistic Regression (C=1)
XGBoost (max_depth=5, learning_rate=0.1)

Week 10+ (Enhanced): 4-model ensemble with temporal weighting

Random Forest (n_estimators=200, max_depth=15, with overfitting protection)
Logistic Regression (C=1, max_iter=1000)
Gradient Boosting Classifier (n_estimators=200, max_depth=8, learning_rate=0.1)
XGBoost (max_depth=8, L1/L2 regularization, subsampling=0.8)

Temporal Weighting: Recent seasons weighted via exponential decay: weight = exp(-0.15 × years_ago). 2024 data weighted ~3× more than 2015 data.

All models wrapped in CalibratedClassifierCV (isotonic regression, cv=3) for better probability estimates.

Data Sources

All data is automatically fetched from nfl_data_py library:

Play-by-play data: Game-level statistics (2015-2025)
Weekly data: Player and team performance metrics
Schedule data: Game schedules, scores, spreads

Data is cached locally in nfl_data/ folders within each week directory to speed up subsequent runs.

Key Functions

NFLGamePredictor Class

collect_data(start_year, end_year): Download NFL data and cache as CSV
create_team_features(weekly_data, season, week): Calculate season-to-date team statistics (includes momentum calculation for Week 10+)
create_game_features(home_team, away_team, ...): Generate matchup features (includes rest_days, division_game, vegas_spread for Week 10+)
build_dataset(pbp_data, weekly_data, schedule_data): Process raw data into training dataset
select_features(df, n_features): Use RFE to identify optimal feature set
create_ensemble_model(df): Train final voting classifier with temporal weighting (Week 10+)
evaluate_model_with_calibration(df): Time-series cross-validation with Brier score and log loss (Week 10+)
predict_games(games_df): Generate predictions for new games
_calculate_injury_percentage(): Estimate injury impact from performance variance (Week 6+)
_calculate_defensive_stats(): Calculate defensive metrics (Week 10+)

NFLSpreadPredictor Class (Week 6+)

train_spread_model(df): Train regression model for point spreads (tests 4 models: Random Forest, Linear Regression, Gradient Boosting with quantile loss, XGBoost with regularization)
predict_spreads(games_df): Predict point differential for games with confidence calculation

Helper Functions

predict_multiple_games(predictor, games_text, season, week): Batch predict from formatted text
predict_multiple_games_with_spreads(predictor, spread_predictor, ...): Enhanced version with spreads
fetch_actual_results(predictions_df, season, week): Auto-fetch game results from nfl_data_py
analyze_week(week_num, season_year, predictions_df, actuals): Compare predictions vs actual results

Naming Conventions

Variables

Weeks 1-5: week{N}_results (e.g., week1_results, week2_results)
Weeks 6+: week{N}_spread_results (e.g., week6_spread_results, week9_spread_results)
Actual results: week{N}_actual_results, week{N}_final_results

Team Abbreviations

Standard NFL abbreviations are used throughout:

PHI (Eagles), KC (Chiefs), LAC (Chargers), SF (49ers), LA (Rams)
Note: Rams use LA not LAR; use the team_mapping dict in prediction functions for consistency

Performance Metrics

Legacy Model Performance (Weeks 1-9, 2025 season)

Overall accuracy: 60.7% across 135 games
Total correct: 82 games, Total incorrect: 53 games
High-confidence picks (>65%): Accuracy varies significantly by week (33%-100%)
Week-to-week variance: ±23 percentage points
Best weeks: Week 3 (87.5%), Week 8 (84.6%)
Worst week: Week 6 (40.0%)

Week-by-Week Performance:

Week 1: 62.5% (10/16), High confidence: 80.0%
Week 2: 56.2% (9/16), High confidence: 75.0%
Week 3: 87.5% (14/16), High confidence: 75.0%
Week 4: 56.2% (9/16), High confidence: 33.3%
Week 5: 42.9% (6/14), High confidence: 66.7%
Week 6: 40.0% (6/15), High confidence: 50.0%
Week 7: 60.0% (9/15), High confidence: 100.0%
Week 8: 84.6% (11/13), High confidence: 83.3%
Week 9: 57.1% (8/14), High confidence: 50.0%

Legacy Spread Model Performance (Weeks 6-9):

MAE: ~10.76 points
RMSE: ~14.21 points

Enhanced Model Performance (Week 10+)

Overall accuracy target: 66-68% (+5-7 points improvement)
High-confidence picks target: 72-75% (+9-12 points improvement)
Week-to-week variance target: ±12 points (50% reduction)
Spread MAE target: 7.5-8.5 points (20-30% improvement)
Spread RMSE target: <10.5 points (26% improvement)

Note: Week 10+ uses enhanced 4-model ensemble with temporal weighting, momentum features, defensive metrics, and improved spread model.

Performance tracking: For comprehensive performance analysis across all weeks, see Week14/Project_Summary_Report.md which includes detailed metrics, feature importance analysis, and insights from the 2024 season.

Common Tasks

Adding a New Week

Create Week{N}/ directory
Copy Model.ipynb from Week 10-14 to get enhanced model with all improvements
- Week 10-14 use the enhanced 4-model ensemble
- For legacy model reference (Weeks 1-9): See Week 6-9 for basic spread predictions
- Latest tested model: Week 14
Update the week number in:
- Game schedule text (update team matchups)
- WEEK_NUMBER and SEASON configuration variables
- Variable names (e.g., week15_games, week15_spread_results)
Update game schedule dictionary with correct dates/times
Run all cells to generate predictions:
- Cell 0: Install dependencies (run once)
- Cells 1-4: Train models (5-15 min first run, 3-5 min cached)
- Cell 6: Generate predictions
- Cell 7: Fetch results (after games complete)
- Cell 8: Export CSV (saves to week{N}_predictions.csv)
After games complete, run result analysis cell
Update MAX_WEEK in Plot.ipynb and re-run to include new week

Note: Week 14 includes academic documentation and may not have all standard outputs if it was used for report generation rather than live predictions.

Extracting Predictions from Existing Notebooks

If a week's Model.ipynb has been executed but the CSV wasn't saved:

Read the notebook file to find the prediction outputs
Look for week{N}_spread_results or week{N}_results in cell outputs
Extract the prediction data (matchup, predicted_winner, confidence, spreads)
Create CSV with required columns: game_num, away_team, home_team, matchup, predicted_winner, confidence, home_win_prob, away_win_prob
For spread predictions (Week 6+), also include: predicted_spread, spread_display, favored_team, spread_magnitude

Modifying the Model

For future improvements:

Adjust feature selection in select_features() (change n_features parameter)
Tune hyperparameters in create_ensemble_model() (e.g., adjust max_depth, learning_rate)
Add new features in create_game_features() (e.g., weather conditions, quarterback ratings)
Modify temporal weighting decay rate (currently -0.15, higher = more recent bias)
For spread model: adjust regression models in train_spread_model()

Testing model changes:

Use evaluate_model_with_calibration(df) for time-series cross-validation (Week 10+)
Monitor Brier score (calibration quality) and log loss (probability accuracy)
Compare predictions on holdout weeks before deploying
Document expected vs actual performance improvements

Troubleshooting Data Issues

If data download fails:

Check internet connection
nfl_data_py may not have current week's data yet
Model falls back to previous season data (e.g., uses 2024 Week 19 for 2025 predictions)
Verify team abbreviations match NFL standard codes

Generating Visualizations

Plot.ipynb automatically creates:

Weekly accuracy line chart with 50% baseline and fill areas
Cumulative accuracy progression tracking
Correct vs incorrect predictions bar chart (grouped by week)
High-confidence pick performance (>65% confidence threshold)
Detailed game-by-game results tables with confidence scores

Important: Update MAX_WEEK variable in Plot.ipynb to match the latest completed week with predictions CSV files.

Running Plot.ipynb:

Ensure all week{N}_predictions.csv files exist in their respective Week directories
Open Plot.ipynb in Jupyter
Verify MAX_WEEK matches your latest week with saved predictions (e.g., MAX_WEEK = 13)
Execute all cells to generate visualizations
The notebook will:
- Auto-detect prediction files from Week1 through Week{MAX_WEEK}
- Fetch actual results via nfl_data_py
- Calculate statistics and generate 4 comprehensive plots
- Display detailed breakdown tables

Dependencies

Required packages:

pandas>=1.5.3
numpy>=1.26.0
matplotlib
seaborn
scikit-learn
xgboost>=3.0.2
nfl_data_py>=0.3.3
joblib
pillow  # For image handling in notebooks

Install with: pip install xgboost nfl_data_py pillow

Critical Implementation Details

Data Flow

Training: Model trains on historical games (2015-2024) where outcomes are known
Feature Generation: For Week N predictions, uses team stats through Week N-1
Prediction: Generates winner/spread predictions for Week N games
Validation: After games complete, fetches actual results and calculates accuracy

Model Behavior

Model retrains from scratch each week using historical data (2015-2024) plus current season
Temporal weighting (Week 10+): Recent seasons weighted exponentially higher via exp(-0.15 × years_ago)
Predictions use team statistics up to (but not including) the target week
Momentum features (Week 10+): Includes last 3 games win percentage
Home field advantage is fixed at 2.5 points in feature engineering
Injury metrics are estimated from performance variance (not actual injury reports)
Spread predictions:
- Week 6-9: Random Forest regression
- Week 10+: Tests 4 models (Random Forest, Linear, Gradient Boosting with quantile loss, XGBoost), selects best by MAE
Confidence scores for spread model: 0.50 + min(0.45, abs(spread) × 0.025), capped at 95%

CSV File Requirements

Each week{N}_predictions.csv must contain these columns for Plot.ipynb to work:

matchup (format: "AWAY @ HOME", e.g., "KC @ LAC")
away_team (3-letter abbreviation)
home_team (3-letter abbreviation)
predicted_winner (3-letter abbreviation)
confidence (float, 0.0 to 1.0)

Optional columns for enhanced analysis:

predicted_spread, spread_display, favored_team, spread_magnitude

Automated Result Fetching

fetch_actual_results() function automatically queries nfl_data_py for completed games
Returns None if games haven't been played yet (futures games have no scores)
Matches predictions to actual results by away_team and home_team fields
Compares predicted winner vs actual winner to calculate accuracy

Model Enhancement History

Week 10 Improvements (November 2025)

Major model enhancements implemented based on comprehensive analysis. All improvements are documented in MODEL_IMPROVEMENTS_SUMMARY.md and QUICK_START_IMPROVED_MODEL.md.

Key Changes:

Enhanced Ensemble: 3→4 models, increased tree depth (5→15 for RF, 5→8 for XGB/GB)
Temporal Weighting: Exponential decay favoring recent seasons (2024 weighted 3× vs 2015)
New Features: Momentum (last 3 games), defensive stats, rest days, division rivalry, vegas spread
Better Cross-Validation: TimeSeriesSplit with Brier score and log loss metrics
Improved Spread Model: Tests 4 regression models, quantile loss for Gradient Boosting

Expected Impact:

Accuracy: 60.7% → 66-68%
Variance reduction: 50% (±23 → ±12 points week-to-week)
Spread MAE: 10.76 → 7.5-8.5 points

Using Enhanced Model:

Copy Week10/Model.ipynb for all future weeks (Week 11+)
All improvements are backward compatible
Enhanced features auto-populate when available in data
Original model preserved in Week10/Model_backup_20251106.ipynb

Validation:

18/18 automated tests passed (100%)
Production-ready, no syntax errors or breaking changes
Maintains full compatibility with Plot.ipynb

Project Documentation

Academic Report

Week14/Project_Summary_Report.md contains a comprehensive 4-page academic summary (2,800 words) including:

Executive Summary: Overview of the ML prediction system achieving 63% accuracy on 2024 season
Problem Statement & Motivation: Sports prediction challenges and ML relevance
Detailed Methodology: Data collection (nfl_data_py), feature engineering (32 features), model architecture (4-model ensemble)
Results & Performance: Comprehensive metrics including accuracy by confidence level, feature importance rankings, model comparisons
Insights & Analysis: What drives NFL success (offense vs defense, momentum effects, home field advantage)
Practical Applications: Guidelines for sports betting, fantasy football, and team analytics
Limitations & Future Work: Data quality issues, model constraints, improvement roadmap

PDF Conversion

Week14/README_PDF_Conversion.md provides multiple methods for converting the summary report to PDF:

Pandoc (recommended): Command-line conversion with LaTeX styling
VS Code/Cursor: Markdown preview → print to PDF
Typora/MacDown: Desktop markdown editors with export
Online converters: Browser-based options
Python automation: markdown-pdf package

Custom Claude Code Agents

This project includes specialized agents in .claude/agents/ for model analysis and optimization:

model-analyzer

Purpose: Analyzes ML model performance, reviews architecture, and identifies optimization opportunities.

When to use:

After model training completes
When accuracy drops below expected thresholds
When investigating performance inconsistencies
When user requests model improvements

Capabilities:

Comprehensive model architecture analysis (ensemble composition, hyperparameters, calibration)
Performance metrics deep dive (accuracy trends, spread MAE/RMSE, confidence calibration)
Feature engineering assessment (RFE selection, missing features, importance scores)
Root cause identification for performance issues
Prioritized recommendations with expected impact estimates

Output: Detailed analysis report with executive summary, performance diagnosis, and prioritized recommendations for model-optimizer agent.

model-optimizer

Purpose: Implements model improvements and optimizations systematically.

When to use:

After receiving recommendations from model-analyzer
When implementing specific model enhancements (hyperparameter changes, new features)
When applying architectural changes to existing models
When optimizing model training configurations

Capabilities:

Incremental implementation of model improvements
Hyperparameter tuning and validation
Feature engineering additions with testing
Model architecture modifications
Performance comparison and documentation

Workflow: Typically invoked after model-analyzer identifies improvement opportunities. Ensures proper implementation, testing, and backward compatibility.

These agents are automatically available when using Claude Code in this repository and should be invoked proactively when working with model performance issues.

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Project Structure

Development Workflow

Running Predictions for a Specific Week

Analyzing Overall Performance

Model Architecture

Two-Stage Prediction System

Feature Engineering

Ensemble Model

Data Sources

Key Functions

NFLGamePredictor Class

NFLSpreadPredictor Class (Week 6+)

Helper Functions

Naming Conventions

Variables

Team Abbreviations

Performance Metrics

Legacy Model Performance (Weeks 1-9, 2025 season)

Enhanced Model Performance (Week 10+)

Common Tasks

Adding a New Week

Extracting Predictions from Existing Notebooks

Modifying the Model

Troubleshooting Data Issues

Generating Visualizations

Dependencies

Critical Implementation Details

Data Flow

Model Behavior

CSV File Requirements

Automated Result Fetching

Model Enhancement History

Week 10 Improvements (November 2025)

Project Documentation

Academic Report

PDF Conversion

Custom Claude Code Agents

model-analyzer

model-optimizer