🎾 ATP Tennis Match Predictor

Machine learning system for predicting ATP tennis match outcomes using XGBoost, ELO ratings, and advanced feature engineering. Surface-specific models achieve 80.71-84.28% accuracy with 0.918-0.937 AUC-ROC on temporal holdout data (2000-2024).

📋 Table of Contents

🎯 Overview
📊 Performance Metrics
🚀 Quick Start
🗂️ Project Structure
⚙️ CLI Commands
🔬 Feature Engineering
📈 ELO Rating System
📁 Dataset Information
🚀 Advanced Features
💻 GPU Acceleration
🧪 Testing
⚙️ Configuration
🔍 Troubleshooting
🙏 Acknowledgments
🤝 Contributing
📄 License & Citation
👤 Author

🎯 Overview

This project implements a tennis match prediction system with the following capabilities:

Key Features

Machine Learning: XGBoost classifier with Bayesian hyperparameter optimization
ELO Rating System: Dynamic player ratings (global and surface-specific)
Feature Engineering: 40+ features including win rates, form, rankings, H2H records, and serve statistics
Temporal Validation: Time-series cross-validation to prevent data leakage
Surface-Specific Models: Separate models for Hard, Clay, and Grass courts
Model Interpretability: SHAP analysis and feature importance visualization
Symmetric Predictions: Bidirectional predictions for improved robustness
GPU Acceleration: CUDA support for faster training
Command-Line Interface: 10 commands for training, prediction, evaluation, and analysis
Comprehensive Testing: Full test suite with pytest (>80% coverage)

Use Cases

Tennis match outcome prediction with probability scores
Player performance analysis and statistics
Head-to-head matchup analysis
Feature importance investigation for tennis analytics
Academic research in sports prediction and machine learning
Tournament forecasting with batch predictions

📊 Performance Metrics

Note: The system uses three surface-specific models (Hard, Clay, Grass) trained independently for optimal performance on each surface type.

Holdout Evaluation (Hard Surface Model)

Metric	Value	Interpretation
Accuracy	84.08%	Excellent for sports prediction
AUC-ROC	0.937	Excellent discrimination capability
Log Loss	0.497	Strong probabilistic calibration
Brier Score	0.156	High probability reliability

Cross-Validation Results (10-Fold Temporal - Hard Surface)

Average AUC: 0.942 ± 0.008 (range: 0.931-0.956)
Average Accuracy: 85.06% ± 1.15%
Best Fold: 86.91% accuracy (Fold 6)
Log Loss: 0.491 ± 0.010

Surface-Specific Performance

Surface	AUC-ROC	Accuracy	Log Loss	Holdout Matches
Hard	0.937	84.08%	0.497	8,174
Clay	0.918	80.71%	0.431	4,848
Grass	0.936	84.28%	0.331	1,539

🚀 Quick Start

Prerequisites

Python: 3.8 or higher
Operating System: Linux, macOS, Windows (with WSL)
Hardware: CPU (minimum), NVIDIA GPU with CUDA (optional, for acceleration)

Installation

# 1. Clone the repository
git clone https://github.com/ulpati/atp-tennis-predictor.git
cd atp-tennis-predictor

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Verify installation
python3 atp.py --help

Download ATP Data

# Download match data for specific years (example: 2020-2024)
mkdir -p data
for year in {2020..2024}; do
  wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_${year}.csv
done

Data Source: Jeff Sackmann's ATP Tennis Repository (CC BY-NC-SA 4.0)

Basic Usage

# Train a model (standard)
python3 atp.py train --data-dir data

# Train with optimization and surface-specific ELO
python3 atp.py train --data-dir data --optimize --elo-per-surface

# Predict a match
python3 atp.py predict "Novak Djokovic" "Carlos Alcaraz" Hard

# Evaluate model performance
python3 atp.py evaluate

# List top players
python3 atp.py list --top 20

# View player statistics
python3 atp.py player "Rafael Nadal"

# Head-to-head analysis
python3 atp.py h2h "Roger Federer" "Rafael Nadal"

🗂️ Project Structure

atp-tennis-predictor/
├── atp.py                      # Main CLI application
├── atp_core/                   # Core library modules
│   ├── __init__.py             # Package initialization
│   ├── config.py               # Configuration constants
│   ├── elo.py                  # ELO rating engine
│   ├── features.py             # Feature extraction logic
│   ├── model.py                # Training and prediction
│   ├── shap_analysis.py        # SHAP interpretability
│   ├── utils.py                # Utility functions
│   └── exceptions.py           # Custom exceptions
├── data/                       # ATP match CSVs (gitignored)
│   └── atp_matches_YYYY.csv    # Annual match data
├── models/                     # Trained models (gitignored)
│   ├── artifact_xgb_*.joblib   # XGBoost models
│   ├── players_stats_*.joblib  # Player statistics
│   └── elo_dict_*.json         # ELO ratings
├── tests/                      # Test suite
│   ├── test_elo_engine.py      # ELO system tests
│   ├── test_prepare_feature.py # Feature extraction tests
│   ├── test_model.py           # Model training tests
│   └── test_utils.py           # Utility function tests
├── requirements.txt            # Python dependencies
├── LICENSE                     # CC BY-NC-SA 4.0 license
├── CITATION.md                 # Citation guidelines
└── README.md                   # This file

⚙️ CLI Commands

The system provides 10 commands accessible via python3 atp.py <command>:

1. `train` - Train Prediction Model

Train an XGBoost model with optional hyperparameter optimization.

python3 atp.py train --data-dir data [OPTIONS]

Options:

Option	Description	Default
`--data-dir PATH`	Directory containing ATP CSV files	`data/`
`--optimize`	Enable Bayesian hyperparameter optimization	`False`
`--use-gpu`	Use NVIDIA GPU for acceleration	`False`
`--elo-per-surface`	Train separate ELO ratings per surface	`False`
`--elo-k FLOAT`	ELO K-factor (rating volatility)	`20.0`
`--recent-n INT`	Number of recent matches for form calculation	`30`
`--surface-specific`	Train separate models per surface (Hard/Clay/Grass)	`False`
`--time-decay FLOAT`	Exponential time decay factor (0=disabled, 0.95=recommended)	`0.0`
`--stacking`	Train stacking ensemble (XGBoost + LogisticRegression)	`False`

Examples:

# Basic training
python3 atp.py train --data-dir data

# Recommended: surface-specific ELO
python3 atp.py train --data-dir data --elo-per-surface

# Advanced: optimization + surface-specific + time decay
python3 atp.py train --data-dir data --optimize --elo-per-surface --time-decay 0.95

# Full training with GPU
python3 atp.py train --data-dir data --optimize --surface-specific --use-gpu --time-decay 0.95

2. `predict` - Single Match Prediction

Predict the outcome of a match between two players on a specific surface.

python3 atp.py predict "Player A" "Player B" Surface [OPTIONS]

Arguments:

Player A: First player name (fuzzy matching supported)
Player B: Second player name (fuzzy matching supported)
Surface: Court surface (Hard, Clay, or Grass)

Options:

--artifact PATH: Path to trained model (auto-detects latest if not specified)
--players PATH: Path to player stats file (auto-detects if not specified)
--json: Output results in JSON format

Example:

python3 atp.py predict "Novak Djokovic" "Carlos Alcaraz" Hard

Output:

======================================================================
  MATCH PREDICTION: Novak Djokovic vs Carlos Alcaraz
  Surface: Hard | Confidence: ❓ LOW
======================================================================

  ✓ Predicted Winner: Carlos Alcaraz
  • Novak Djokovic win probability:  47.80%
  • Carlos Alcaraz win probability:  52.20%

  Raw probabilities (before averaging):
    - Novak Djokovic -> Carlos Alcaraz:  56.03%
    - Carlos Alcaraz -> Novak Djokovic:  60.44%
======================================================================

3. `evaluate` - Model Evaluation

Evaluate model performance on holdout data with cross-validation metrics.

python3 atp.py evaluate [--artifact PATH] [--json]

Output includes:

Holdout metrics (Accuracy, AUC-ROC, Log Loss, Brier Score)
10-fold temporal cross-validation results
Performance interpretation

4. `list` - List Top Players

Display top players ranked by total match wins.

python3 atp.py list [--top N] [--players PATH]

Example:

python3 atp.py list --top 10

Output:

Rank  Player                           Record       Win%     Surfaces
----------------------------------------------------------------------
1     Roger Federer                    1250W-260L   82.8%  Carpet:44, Clay:227, Grass:194, Hard:785
2     Novak Djokovic                   1139W-224L   83.6%  Carpet:9, Clay:291, Grass:121, Hard:718
3     Rafael Nadal                     1091W-235L   82.3%  Carpet:2, Clay:488, Grass:76, Hard:525
4     Andy Murray                      748W-269L   73.5%  Carpet:8, Clay:110, Grass:120, Hard:510
5     David Ferrer                     740W-379L   66.1%  Carpet:9, Clay:337, Grass:44, Hard:350

5. `player` - Player Statistics

View detailed statistics for a specific player.

python3 atp.py player "Player Name" [--players PATH]

Example:

python3 atp.py player "Rafael Nadal"

Output:

PLAYER PROFILE: Rafael Nadal

Overall Record: 1091W-235L (82.3%)

Surface Breakdown:
  • Carpet  :   2W-6  L (  25.0%)
  • Clay    : 488W-53 L (  90.2%)
  • Grass   :  76W-21 L (  78.4%)
  • Hard    : 525W-155L (  77.2%)

Recent Form (last 10): W L L W W W W L W L

6. `h2h` - Head-to-Head Analysis

Analyze head-to-head record between two players.

python3 atp.py h2h "Player A" "Player B" [--players PATH]

Example:

python3 atp.py h2h "Roger Federer" "Rafael Nadal"

Output:

HEAD-TO-HEAD: Roger Federer vs Rafael Nadal

Overall Record:
  Roger Federer: 17W
  Rafael Nadal: 24W
  Total: 41 matches

Win Percentage:
  Roger Federer: 41.5%
  Rafael Nadal: 58.5%

Surface Breakdown:
  Hard     : 12W-9L (57.1% for Roger Federer)
  Clay     : 2W-14L (12.5% for Roger Federer)
  Grass    : 3W-1L (75.0% for Roger Federer)

7. `batch` - Batch Predictions

Predict multiple matches from a CSV file.

python3 atp.py batch matches.csv [OPTIONS]

CSV Format:

player1,player2,surface
Novak Djokovic,Rafael Nadal,Clay
Roger Federer,Andy Murray,Grass
Carlos Alcaraz,Jannik Sinner,Hard

8. `importance` - Feature Importance

Display XGBoost feature importance rankings.

python3 atp.py importance [--top N] [--artifact PATH]

Example:

python3 atp.py importance --top 10

Output:

TOP 10 MOST IMPORTANT FEATURES

Rank  Feature                                  Importance     
----------------------------------------------------------------------
1     points_diff                              0.225201 ██████████████████████
2     points_ratio                             0.216739 █████████████████████
3     elo_exp_p1                               0.126393 ████████████
4     elo_diff                                 0.116776 ███████████
5     rank_ratio                               0.095208 █████████
6     rank_diff                                0.016479 █
7     consistency_diff                         0.016412 █
8     recent_diff                              0.013811 █
9     elo_p2_before                            0.009583 
10    winrate_diff                             0.008949

9. `shap` - SHAP Analysis

Compute SHAP values for model interpretability.

python3 atp.py shap [--top N] [--max-samples N] [--artifact PATH] [--json]

Purpose: Analyze feature contributions using SHAP (SHapley Additive exPlanations) values for more accurate attribution.

10. `select-features` - Feature Selection

Automatically select important features using SHAP values.

python3 atp.py select-features [--threshold FLOAT] [--max-features N] [--artifact PATH]

Purpose: Identify and remove low-impact features to reduce overfitting and improve generalization.

🔬 Feature Engineering

The model uses 40+ engineered features across 8 categories:

Category	Features	Description
ELO Ratings	4	Player ELO scores, differences, and expected probabilities
Win Rates	6	Global and surface-specific win percentages
ATP Rankings	8	Current rankings, points, differences, and ratios
Recent Form	3	Performance in last N matches (with optional time decay)
Momentum	4	Performance trends and consistency (standard deviation)
Serve Statistics	6	Aces, double faults, break points saved
Head-to-Head	2	Historical matchup records
Surface	3	One-hot encoded (Hard, Clay, Grass)

Key Features:

elo_diff: Difference in ELO ratings between players
points_diff: Difference in ATP ranking points
rank_ratio: Ratio of ATP rankings (handles unranked players)
recent_diff: Recent form difference (last 30 matches)
surf_winrate_diff: Surface-specific win rate difference
h2h_p1_over_p2_prior: Head-to-head win rate

Important: All features are computed incrementally during temporal validation to prevent data leakage.

📈 ELO Rating System

Overview

The ELO rating system calculates relative skill levels of players. Each player starts with a base rating of 1500, and ratings update after each match based on expected vs. actual outcomes.

Formula

expected_prob = 1 / (1 + 10^((elo_B - elo_A) / 400))
new_elo = old_elo + K * (actual_result - expected_prob)

Configuration

Base Rating: 1500 (for new players)
K-Factor: 20.0 (default, controls rating volatility)
Surface-Specific: Optional (separate ELO for Hard/Clay/Grass)

K-Factor Selection

The K-factor controls how much ratings change after each match:

K=10 (Conservative): More stable, slower adaptation
K=20 (Balanced): Standard for most sports - recommended
K=32 (Aggressive): Faster adaptation to form changes

Usage:

# Standard K-factor
python3 atp.py train --data-dir data --elo-k 20.0

# Conservative ratings
python3 atp.py train --data-dir data --elo-k 10.0

# Aggressive ratings
python3 atp.py train --data-dir data --elo-k 32.0

Example Calculation

# Before match
Djokovic: 2100 ELO (Hard)
Alcaraz:  1950 ELO (Hard)

# Expected probability
P(Djokovic wins) = 1 / (1 + 10^((1950-2100)/400)) = 0.679

# If Djokovic wins:
new_elo_djokovic = 2100 + 20 * (1 - 0.679) = 2106.4
new_elo_alcaraz = 1950 + 20 * (0 - 0.321) = 1943.6

📁 Dataset Information

Data Source

ATP match data is obtained from Jeff Sackmann's public repository:

🔗 github.com/JeffSackmann/tennis_atp

Dataset Contents

ATP matches from 1968 to present (tour-level, challenger, futures)
Historical rankings from 1973
Detailed match stats (from 1991 for tour-level, 2008 for challenger)
Player biographical data (name, birth date, country, height, handedness)

CSV File Structure

Each atp_matches_YYYY.csv file contains:

tourney_id, tourney_name, surface, draw_size, tourney_level, tourney_date,
winner_id, winner_name, winner_rank, winner_rank_points, winner_age, winner_ht,
loser_id, loser_name, loser_rank, loser_rank_points, loser_age, loser_ht,
score, best_of, round, minutes,
w_ace, w_df, w_svpt, w_1stIn, w_1stWon, w_2ndWon, w_SvGms, w_bpSaved, w_bpFaced,
l_ace, l_df, l_svpt, l_1stIn, l_1stWon, l_2ndWon, l_SvGms, l_bpSaved, l_bpFaced

See matches_data_dictionary.txt in the original repository for complete field descriptions.

Download Instructions

# Method 1: Direct download (recommended for specific years)
mkdir -p data
for year in {2000..2024}; do
  wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_${year}.csv
done

# Method 2: Clone entire repository (all data)
git clone https://github.com/JeffSackmann/tennis_atp.git temp_data
mv temp_data/atp_matches_*.csv data/
rm -rf temp_data

Data License

ATP match data is © Jeff Sackmann / Tennis Abstract and is licensed under CC BY-NC-SA 4.0.

Terms:

✅ Attribution required: Must cite Jeff Sackmann / Tennis Abstract
⚠️ Non-commercial use only: Cannot use data for commercial purposes
✅ ShareAlike: Modifications must be shared under the same license

🚀 Advanced Features

1. Time Decay Weighting

Time decay gives more weight to recent matches when calculating player statistics.

Formula: weight = decay_factor ^ days_ago

Decay Factor Examples:

0.90: Aggressive - strong focus on very recent form (half-life: 6.6 days)
0.95: Recommended - balanced recency bias (half-life: 13.5 days)
0.98: Conservative - slower adaptation (half-life: 34.3 days)

Usage:

python3 atp.py train --data-dir data --time-decay 0.95

Impact: Recent matches weighted more heavily in features like recent_form, winrate, and consistency.

2. Surface-Specific Models

Train separate XGBoost models for each court surface (Hard, Clay, Grass).

Benefits:

Captures surface-specific patterns
Better accuracy for surface specialists
Improved predictions for surface-specific tournaments

Usage:

python3 atp.py train --data-dir data --surface-specific --optimize

3. Stacking Ensemble

Meta-learning approach combining multiple XGBoost models with different hyperparameters.

Architecture:

Base Model 1: Conservative (depth=5, lr=0.02, high regularization)
Base Model 2: Balanced (depth=7, lr=0.03, moderate)
Base Model 3: Aggressive (depth=9, lr=0.01, low regularization)
Meta-Learner: LogisticRegression with class_weight='balanced'

Usage:

python3 atp.py train --stacking --optimize --use-gpu

Expected Benefits:

+1-2% accuracy improvement
Better calibration and confidence estimates
Reduced variance through model averaging

4. SHAP-Based Feature Selection

Automatic feature pruning using SHAP (SHapley Additive exPlanations) values.

Algorithm:

Train model on full feature set
Compute SHAP values for sample of test data
Calculate mean absolute SHAP value per feature
Remove features below threshold (default: 0.01)

Usage:

python3 atp.py select-features --threshold 0.01 --max-samples 1000

Benefits:

Reduces overfitting by removing noise features
Faster inference with fewer features
Improved model interpretability

💻 GPU Acceleration

This project supports NVIDIA GPU acceleration for faster training using CUDA.

Prerequisites

NVIDIA GPU with CUDA support (compute capability 3.5+)
NVIDIA Driver installed (verify with nvidia-smi)
XGBoost 3.x with CUDA support (included in requirements.txt)

Verify GPU Support

# Check NVIDIA GPU
nvidia-smi

# Test XGBoost GPU support
python3 -c "import xgboost as xgb; print(f'CUDA: {xgb.build_info()[\"USE_CUDA\"]}')"

Using GPU

# Enable GPU with --use-gpu flag
python3 atp.py train --data-dir data --use-gpu --optimize

Note: GPU acceleration is optional. The system works perfectly with CPU-only training.

🧪 Testing

Running Tests

# Activate virtual environment
source venv/bin/activate

# Run all tests
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ -v

# Test with coverage
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ --cov=atp_core --cov-report=html

# Run specific test files
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_elo_engine.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_prepare_feature.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_model.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_utils.py -v

Test Coverage

✅ ELO rating system tests
✅ Feature extraction and time decay tests
✅ Model training and SHAP analysis tests
✅ Utility function tests
✅ Expected: >80% coverage on core modules

Note: PYTHONPATH=$PWD:$PYTHONPATH is required to allow tests to import atp_core and atp modules.

⚙️ Configuration

Environment Variables

# Paths
export DATA_DIR="data"
export MODEL_DIR="models"
export LOG_LEVEL="INFO"

# ELO
export ELO_K_FACTOR="20.0"
export ELO_PER_SURFACE="true"

# XGBoost
export XGBOOST_LEARNING_RATE="0.02"
export XGBOOST_MAX_DEPTH="7"
export USE_GPU="false"

# Features
export RECENT_N_MATCHES="30"

Configuration File

Modify default parameters in atp_core/config.py:

# Default configuration constants
DEFAULT_ELO_K = 20.0
DEFAULT_BASE_ELO = 1500
DEFAULT_RECENT_N = 30
DEFAULT_TIME_DECAY = 0.0

🐛 Troubleshooting

Problem: "No CSV files found"

# Verify data directory
ls data/atp_matches_*.csv

# Download data
mkdir -p data
wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2024.csv

Problem: "Model artifact not found"

# Train model first
python3 atp.py train --data-dir data

Problem: "Player not found"

# Use fuzzy matching (automatic)
python3 atp.py predict "Fedrer" "Nadal" Hard  # Finds "Federer"

# List available players
python3 atp.py list --top 100

Problem: GPU not detected

# Verify NVIDIA driver
nvidia-smi

# Check CUDA in XGBoost
python3 -c "import xgboost as xgb; print(xgb.build_info())"

# Reinstall XGBoost with CUDA support
pip uninstall xgboost
pip install xgboost --no-cache-dir

Problem: Import errors

# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

# For tests, use PYTHONPATH
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ -v

📚 References

Dataset

Sackmann, Jeff. ATP Tennis Rankings, Results, and Stats. GitHub repository. github.com/JeffSackmann/tennis_atp. Licensed under CC BY-NC-SA 4.0.
Tennis Abstract by Jeff Sackmann
ATP Official Website

Algorithms & Libraries

🙏 Acknowledgments

This project would not be possible without:

Jeff Sackmann (Tennis Abstract) — For maintaining the comprehensive ATP tennis dataset for over a decade. His dedication to open tennis data has enabled countless research projects and applications. Repository: JeffSackmann/tennis_atp
XGBoost Development Team — For creating and maintaining an exceptional gradient boosting library that powers the prediction engine
Scikit-learn Contributors — For the robust machine learning framework and tools used throughout this project
SHAP Development Team — For the interpretability tools that make model decisions transparent
Tennis Analytics Community — Including researchers and practitioners on Kaggle and elsewhere who have shared insights and best practices in sports prediction

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License & Citation

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) - see the LICENSE file for details.

Citation

If you use this project in research or academic work, please cite both the software and the ATP dataset.

See CITATION.md for detailed citation formats (BibTeX, APA, IEEE, Chicago).

Quick Citation:

ulpati. (2025). ATP Tennis Match Predictor: Machine learning system for match 
prediction (Version 1.0) [Computer software]. GitHub. 
https://github.com/ulpati/atp-tennis-predictor

Sackmann, J. (2025). ATP tennis rankings, results, and stats [Data set]. 
Tennis Abstract. https://github.com/JeffSackmann/tennis_atp

👤 Author: ulpati

⭐ If you find this project useful, please consider starring the repository!

💬 Questions? Open an issue at github.com/ulpati/atp-tennis-predictor/issues

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
atp_core		atp_core
tests		tests
.gitignore		.gitignore
CITATION.md		CITATION.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
atp.py		atp.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎾 ATP Tennis Match Predictor

📋 Table of Contents

🎯 Overview

Key Features

Use Cases

📊 Performance Metrics

Holdout Evaluation (Hard Surface Model)

Cross-Validation Results (10-Fold Temporal - Hard Surface)

Surface-Specific Performance

🚀 Quick Start

Prerequisites

Installation

Download ATP Data

Basic Usage

🗂️ Project Structure

⚙️ CLI Commands

1. train - Train Prediction Model

2. predict - Single Match Prediction

3. evaluate - Model Evaluation

4. list - List Top Players

5. player - Player Statistics

6. h2h - Head-to-Head Analysis

7. batch - Batch Predictions

8. importance - Feature Importance

9. shap - SHAP Analysis

10. select-features - Feature Selection

🔬 Feature Engineering

📈 ELO Rating System

Overview

Formula

Configuration

K-Factor Selection

Example Calculation

📁 Dataset Information

Data Source

Dataset Contents

CSV File Structure

Download Instructions

Data License

🚀 Advanced Features

1. Time Decay Weighting

2. Surface-Specific Models

3. Stacking Ensemble

4. SHAP-Based Feature Selection

💻 GPU Acceleration

Prerequisites

Verify GPU Support

Using GPU

🧪 Testing

Running Tests

Test Coverage

⚙️ Configuration

Environment Variables

Configuration File

🐛 Troubleshooting

Problem: "No CSV files found"

Problem: "Model artifact not found"

Problem: "Player not found"

Problem: GPU not detected

Problem: Import errors

📚 References

Dataset

Algorithms & Libraries

🙏 Acknowledgments

🤝 Contributing

📄 License & Citation

License

Citation

👤 Author: ulpati

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

1. `train` - Train Prediction Model

2. `predict` - Single Match Prediction

3. `evaluate` - Model Evaluation

4. `list` - List Top Players

5. `player` - Player Statistics

6. `h2h` - Head-to-Head Analysis

7. `batch` - Batch Predictions

8. `importance` - Feature Importance

9. `shap` - SHAP Analysis

10. `select-features` - Feature Selection

Packages