Skip to content

ulpati/atp-tennis-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎾 ATP Tennis Match Predictor

Python 3.8+ XGBoost License: CC BY-NC-SA 4.0

Machine learning system for predicting ATP tennis match outcomes using XGBoost, ELO ratings, and advanced feature engineering. Surface-specific models achieve 80.71-84.28% accuracy with 0.918-0.937 AUC-ROC on temporal holdout data (2000-2024).

πŸ“‹ Table of Contents


🎯 Overview

This project implements a tennis match prediction system with the following capabilities:

Key Features

  • Machine Learning: XGBoost classifier with Bayesian hyperparameter optimization
  • ELO Rating System: Dynamic player ratings (global and surface-specific)
  • Feature Engineering: 40+ features including win rates, form, rankings, H2H records, and serve statistics
  • Temporal Validation: Time-series cross-validation to prevent data leakage
  • Surface-Specific Models: Separate models for Hard, Clay, and Grass courts
  • Model Interpretability: SHAP analysis and feature importance visualization
  • Symmetric Predictions: Bidirectional predictions for improved robustness
  • GPU Acceleration: CUDA support for faster training
  • Command-Line Interface: 10 commands for training, prediction, evaluation, and analysis
  • Comprehensive Testing: Full test suite with pytest (>80% coverage)

Use Cases

  • Tennis match outcome prediction with probability scores
  • Player performance analysis and statistics
  • Head-to-head matchup analysis
  • Feature importance investigation for tennis analytics
  • Academic research in sports prediction and machine learning
  • Tournament forecasting with batch predictions

πŸ“Š Performance Metrics

Note: The system uses three surface-specific models (Hard, Clay, Grass) trained independently for optimal performance on each surface type.

Holdout Evaluation (Hard Surface Model)

Metric Value Interpretation
Accuracy 84.08% Excellent for sports prediction
AUC-ROC 0.937 Excellent discrimination capability
Log Loss 0.497 Strong probabilistic calibration
Brier Score 0.156 High probability reliability

Cross-Validation Results (10-Fold Temporal - Hard Surface)

  • Average AUC: 0.942 Β± 0.008 (range: 0.931-0.956)
  • Average Accuracy: 85.06% Β± 1.15%
  • Best Fold: 86.91% accuracy (Fold 6)
  • Log Loss: 0.491 Β± 0.010

Surface-Specific Performance

Surface AUC-ROC Accuracy Log Loss Holdout Matches
Hard 0.937 84.08% 0.497 8,174
Clay 0.918 80.71% 0.431 4,848
Grass 0.936 84.28% 0.331 1,539

πŸš€ Quick Start

Prerequisites

  • Python: 3.8 or higher
  • Operating System: Linux, macOS, Windows (with WSL)
  • Hardware: CPU (minimum), NVIDIA GPU with CUDA (optional, for acceleration)

Installation

# 1. Clone the repository
git clone https://github.com/ulpati/atp-tennis-predictor.git
cd atp-tennis-predictor

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Verify installation
python3 atp.py --help

Download ATP Data

# Download match data for specific years (example: 2020-2024)
mkdir -p data
for year in {2020..2024}; do
  wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_${year}.csv
done

Data Source: Jeff Sackmann's ATP Tennis Repository (CC BY-NC-SA 4.0)

Basic Usage

# Train a model (standard)
python3 atp.py train --data-dir data

# Train with optimization and surface-specific ELO
python3 atp.py train --data-dir data --optimize --elo-per-surface

# Predict a match
python3 atp.py predict "Novak Djokovic" "Carlos Alcaraz" Hard

# Evaluate model performance
python3 atp.py evaluate

# List top players
python3 atp.py list --top 20

# View player statistics
python3 atp.py player "Rafael Nadal"

# Head-to-head analysis
python3 atp.py h2h "Roger Federer" "Rafael Nadal"

πŸ—‚οΈ Project Structure

atp-tennis-predictor/
β”œβ”€β”€ atp.py                      # Main CLI application
β”œβ”€β”€ atp_core/                   # Core library modules
β”‚   β”œβ”€β”€ __init__.py             # Package initialization
β”‚   β”œβ”€β”€ config.py               # Configuration constants
β”‚   β”œβ”€β”€ elo.py                  # ELO rating engine
β”‚   β”œβ”€β”€ features.py             # Feature extraction logic
β”‚   β”œβ”€β”€ model.py                # Training and prediction
β”‚   β”œβ”€β”€ shap_analysis.py        # SHAP interpretability
β”‚   β”œβ”€β”€ utils.py                # Utility functions
β”‚   └── exceptions.py           # Custom exceptions
β”œβ”€β”€ data/                       # ATP match CSVs (gitignored)
β”‚   └── atp_matches_YYYY.csv    # Annual match data
β”œβ”€β”€ models/                     # Trained models (gitignored)
β”‚   β”œβ”€β”€ artifact_xgb_*.joblib   # XGBoost models
β”‚   β”œβ”€β”€ players_stats_*.joblib  # Player statistics
β”‚   └── elo_dict_*.json         # ELO ratings
β”œβ”€β”€ tests/                      # Test suite
β”‚   β”œβ”€β”€ test_elo_engine.py      # ELO system tests
β”‚   β”œβ”€β”€ test_prepare_feature.py # Feature extraction tests
β”‚   β”œβ”€β”€ test_model.py           # Model training tests
β”‚   └── test_utils.py           # Utility function tests
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ LICENSE                     # CC BY-NC-SA 4.0 license
β”œβ”€β”€ CITATION.md                 # Citation guidelines
└── README.md                   # This file

βš™οΈ CLI Commands

The system provides 10 commands accessible via python3 atp.py <command>:

1. train - Train Prediction Model

Train an XGBoost model with optional hyperparameter optimization.

python3 atp.py train --data-dir data [OPTIONS]

Options:

Option Description Default
--data-dir PATH Directory containing ATP CSV files data/
--optimize Enable Bayesian hyperparameter optimization False
--use-gpu Use NVIDIA GPU for acceleration False
--elo-per-surface Train separate ELO ratings per surface False
--elo-k FLOAT ELO K-factor (rating volatility) 20.0
--recent-n INT Number of recent matches for form calculation 30
--surface-specific Train separate models per surface (Hard/Clay/Grass) False
--time-decay FLOAT Exponential time decay factor (0=disabled, 0.95=recommended) 0.0
--stacking Train stacking ensemble (XGBoost + LogisticRegression) False

Examples:

# Basic training
python3 atp.py train --data-dir data

# Recommended: surface-specific ELO
python3 atp.py train --data-dir data --elo-per-surface

# Advanced: optimization + surface-specific + time decay
python3 atp.py train --data-dir data --optimize --elo-per-surface --time-decay 0.95

# Full training with GPU
python3 atp.py train --data-dir data --optimize --surface-specific --use-gpu --time-decay 0.95

2. predict - Single Match Prediction

Predict the outcome of a match between two players on a specific surface.

python3 atp.py predict "Player A" "Player B" Surface [OPTIONS]

Arguments:

  • Player A: First player name (fuzzy matching supported)
  • Player B: Second player name (fuzzy matching supported)
  • Surface: Court surface (Hard, Clay, or Grass)

Options:

  • --artifact PATH: Path to trained model (auto-detects latest if not specified)
  • --players PATH: Path to player stats file (auto-detects if not specified)
  • --json: Output results in JSON format

Example:

python3 atp.py predict "Novak Djokovic" "Carlos Alcaraz" Hard

Output:

======================================================================
  MATCH PREDICTION: Novak Djokovic vs Carlos Alcaraz
  Surface: Hard | Confidence: ❓ LOW
======================================================================

  βœ“ Predicted Winner: Carlos Alcaraz
  β€’ Novak Djokovic win probability:  47.80%
  β€’ Carlos Alcaraz win probability:  52.20%

  Raw probabilities (before averaging):
    - Novak Djokovic -> Carlos Alcaraz:  56.03%
    - Carlos Alcaraz -> Novak Djokovic:  60.44%
======================================================================

3. evaluate - Model Evaluation

Evaluate model performance on holdout data with cross-validation metrics.

python3 atp.py evaluate [--artifact PATH] [--json]

Output includes:

  • Holdout metrics (Accuracy, AUC-ROC, Log Loss, Brier Score)
  • 10-fold temporal cross-validation results
  • Performance interpretation

4. list - List Top Players

Display top players ranked by total match wins.

python3 atp.py list [--top N] [--players PATH]

Example:

python3 atp.py list --top 10

Output:

Rank  Player                           Record       Win%     Surfaces
----------------------------------------------------------------------
1     Roger Federer                    1250W-260L   82.8%  Carpet:44, Clay:227, Grass:194, Hard:785
2     Novak Djokovic                   1139W-224L   83.6%  Carpet:9, Clay:291, Grass:121, Hard:718
3     Rafael Nadal                     1091W-235L   82.3%  Carpet:2, Clay:488, Grass:76, Hard:525
4     Andy Murray                      748W-269L   73.5%  Carpet:8, Clay:110, Grass:120, Hard:510
5     David Ferrer                     740W-379L   66.1%  Carpet:9, Clay:337, Grass:44, Hard:350

5. player - Player Statistics

View detailed statistics for a specific player.

python3 atp.py player "Player Name" [--players PATH]

Example:

python3 atp.py player "Rafael Nadal"

Output:

PLAYER PROFILE: Rafael Nadal

Overall Record: 1091W-235L (82.3%)

Surface Breakdown:
  β€’ Carpet  :   2W-6  L (  25.0%)
  β€’ Clay    : 488W-53 L (  90.2%)
  β€’ Grass   :  76W-21 L (  78.4%)
  β€’ Hard    : 525W-155L (  77.2%)

Recent Form (last 10): W L L W W W W L W L

6. h2h - Head-to-Head Analysis

Analyze head-to-head record between two players.

python3 atp.py h2h "Player A" "Player B" [--players PATH]

Example:

python3 atp.py h2h "Roger Federer" "Rafael Nadal"

Output:

HEAD-TO-HEAD: Roger Federer vs Rafael Nadal

Overall Record:
  Roger Federer: 17W
  Rafael Nadal: 24W
  Total: 41 matches

Win Percentage:
  Roger Federer: 41.5%
  Rafael Nadal: 58.5%

Surface Breakdown:
  Hard     : 12W-9L (57.1% for Roger Federer)
  Clay     : 2W-14L (12.5% for Roger Federer)
  Grass    : 3W-1L (75.0% for Roger Federer)

7. batch - Batch Predictions

Predict multiple matches from a CSV file.

python3 atp.py batch matches.csv [OPTIONS]

CSV Format:

player1,player2,surface
Novak Djokovic,Rafael Nadal,Clay
Roger Federer,Andy Murray,Grass
Carlos Alcaraz,Jannik Sinner,Hard

8. importance - Feature Importance

Display XGBoost feature importance rankings.

python3 atp.py importance [--top N] [--artifact PATH]

Example:

python3 atp.py importance --top 10

Output:

TOP 10 MOST IMPORTANT FEATURES

Rank  Feature                                  Importance     
----------------------------------------------------------------------
1     points_diff                              0.225201 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2     points_ratio                             0.216739 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3     elo_exp_p1                               0.126393 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
4     elo_diff                                 0.116776 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
5     rank_ratio                               0.095208 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
6     rank_diff                                0.016479 β–ˆ
7     consistency_diff                         0.016412 β–ˆ
8     recent_diff                              0.013811 β–ˆ
9     elo_p2_before                            0.009583 
10    winrate_diff                             0.008949 

9. shap - SHAP Analysis

Compute SHAP values for model interpretability.

python3 atp.py shap [--top N] [--max-samples N] [--artifact PATH] [--json]

Purpose: Analyze feature contributions using SHAP (SHapley Additive exPlanations) values for more accurate attribution.


10. select-features - Feature Selection

Automatically select important features using SHAP values.

python3 atp.py select-features [--threshold FLOAT] [--max-features N] [--artifact PATH]

Purpose: Identify and remove low-impact features to reduce overfitting and improve generalization.


πŸ”¬ Feature Engineering

The model uses 40+ engineered features across 8 categories:

Category Features Description
ELO Ratings 4 Player ELO scores, differences, and expected probabilities
Win Rates 6 Global and surface-specific win percentages
ATP Rankings 8 Current rankings, points, differences, and ratios
Recent Form 3 Performance in last N matches (with optional time decay)
Momentum 4 Performance trends and consistency (standard deviation)
Serve Statistics 6 Aces, double faults, break points saved
Head-to-Head 2 Historical matchup records
Surface 3 One-hot encoded (Hard, Clay, Grass)

Key Features:

  • elo_diff: Difference in ELO ratings between players
  • points_diff: Difference in ATP ranking points
  • rank_ratio: Ratio of ATP rankings (handles unranked players)
  • recent_diff: Recent form difference (last 30 matches)
  • surf_winrate_diff: Surface-specific win rate difference
  • h2h_p1_over_p2_prior: Head-to-head win rate

Important: All features are computed incrementally during temporal validation to prevent data leakage.


πŸ“ˆ ELO Rating System

Overview

The ELO rating system calculates relative skill levels of players. Each player starts with a base rating of 1500, and ratings update after each match based on expected vs. actual outcomes.

Formula

expected_prob = 1 / (1 + 10^((elo_B - elo_A) / 400))
new_elo = old_elo + K * (actual_result - expected_prob)

Configuration

  • Base Rating: 1500 (for new players)
  • K-Factor: 20.0 (default, controls rating volatility)
  • Surface-Specific: Optional (separate ELO for Hard/Clay/Grass)

K-Factor Selection

The K-factor controls how much ratings change after each match:

  • K=10 (Conservative): More stable, slower adaptation
  • K=20 (Balanced): Standard for most sports - recommended
  • K=32 (Aggressive): Faster adaptation to form changes

Usage:

# Standard K-factor
python3 atp.py train --data-dir data --elo-k 20.0

# Conservative ratings
python3 atp.py train --data-dir data --elo-k 10.0

# Aggressive ratings
python3 atp.py train --data-dir data --elo-k 32.0

Example Calculation

# Before match
Djokovic: 2100 ELO (Hard)
Alcaraz:  1950 ELO (Hard)

# Expected probability
P(Djokovic wins) = 1 / (1 + 10^((1950-2100)/400)) = 0.679

# If Djokovic wins:
new_elo_djokovic = 2100 + 20 * (1 - 0.679) = 2106.4
new_elo_alcaraz = 1950 + 20 * (0 - 0.321) = 1943.6

πŸ“ Dataset Information

Data Source

ATP match data is obtained from Jeff Sackmann's public repository:

πŸ”— github.com/JeffSackmann/tennis_atp

Dataset Contents

  • ATP matches from 1968 to present (tour-level, challenger, futures)
  • Historical rankings from 1973
  • Detailed match stats (from 1991 for tour-level, 2008 for challenger)
  • Player biographical data (name, birth date, country, height, handedness)

CSV File Structure

Each atp_matches_YYYY.csv file contains:

tourney_id, tourney_name, surface, draw_size, tourney_level, tourney_date,
winner_id, winner_name, winner_rank, winner_rank_points, winner_age, winner_ht,
loser_id, loser_name, loser_rank, loser_rank_points, loser_age, loser_ht,
score, best_of, round, minutes,
w_ace, w_df, w_svpt, w_1stIn, w_1stWon, w_2ndWon, w_SvGms, w_bpSaved, w_bpFaced,
l_ace, l_df, l_svpt, l_1stIn, l_1stWon, l_2ndWon, l_SvGms, l_bpSaved, l_bpFaced

See matches_data_dictionary.txt in the original repository for complete field descriptions.

Download Instructions

# Method 1: Direct download (recommended for specific years)
mkdir -p data
for year in {2000..2024}; do
  wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_${year}.csv
done

# Method 2: Clone entire repository (all data)
git clone https://github.com/JeffSackmann/tennis_atp.git temp_data
mv temp_data/atp_matches_*.csv data/
rm -rf temp_data

Data License

ATP match data is Β© Jeff Sackmann / Tennis Abstract and is licensed under CC BY-NC-SA 4.0.

Terms:

  • βœ… Attribution required: Must cite Jeff Sackmann / Tennis Abstract
  • ⚠️ Non-commercial use only: Cannot use data for commercial purposes
  • βœ… ShareAlike: Modifications must be shared under the same license

πŸš€ Advanced Features

1. Time Decay Weighting

Time decay gives more weight to recent matches when calculating player statistics.

Formula: weight = decay_factor ^ days_ago

Decay Factor Examples:

  • 0.90: Aggressive - strong focus on very recent form (half-life: 6.6 days)
  • 0.95: Recommended - balanced recency bias (half-life: 13.5 days)
  • 0.98: Conservative - slower adaptation (half-life: 34.3 days)

Usage:

python3 atp.py train --data-dir data --time-decay 0.95

Impact: Recent matches weighted more heavily in features like recent_form, winrate, and consistency.


2. Surface-Specific Models

Train separate XGBoost models for each court surface (Hard, Clay, Grass).

Benefits:

  • Captures surface-specific patterns
  • Better accuracy for surface specialists
  • Improved predictions for surface-specific tournaments

Usage:

python3 atp.py train --data-dir data --surface-specific --optimize

3. Stacking Ensemble

Meta-learning approach combining multiple XGBoost models with different hyperparameters.

Architecture:

  • Base Model 1: Conservative (depth=5, lr=0.02, high regularization)
  • Base Model 2: Balanced (depth=7, lr=0.03, moderate)
  • Base Model 3: Aggressive (depth=9, lr=0.01, low regularization)
  • Meta-Learner: LogisticRegression with class_weight='balanced'

Usage:

python3 atp.py train --stacking --optimize --use-gpu

Expected Benefits:

  • +1-2% accuracy improvement
  • Better calibration and confidence estimates
  • Reduced variance through model averaging

4. SHAP-Based Feature Selection

Automatic feature pruning using SHAP (SHapley Additive exPlanations) values.

Algorithm:

  1. Train model on full feature set
  2. Compute SHAP values for sample of test data
  3. Calculate mean absolute SHAP value per feature
  4. Remove features below threshold (default: 0.01)

Usage:

python3 atp.py select-features --threshold 0.01 --max-samples 1000

Benefits:

  • Reduces overfitting by removing noise features
  • Faster inference with fewer features
  • Improved model interpretability

πŸ’» GPU Acceleration

This project supports NVIDIA GPU acceleration for faster training using CUDA.

Prerequisites

  1. NVIDIA GPU with CUDA support (compute capability 3.5+)
  2. NVIDIA Driver installed (verify with nvidia-smi)
  3. XGBoost 3.x with CUDA support (included in requirements.txt)

Verify GPU Support

# Check NVIDIA GPU
nvidia-smi

# Test XGBoost GPU support
python3 -c "import xgboost as xgb; print(f'CUDA: {xgb.build_info()[\"USE_CUDA\"]}')"

Using GPU

# Enable GPU with --use-gpu flag
python3 atp.py train --data-dir data --use-gpu --optimize

Note: GPU acceleration is optional. The system works perfectly with CPU-only training.


πŸ§ͺ Testing

Running Tests

# Activate virtual environment
source venv/bin/activate

# Run all tests
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ -v

# Test with coverage
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ --cov=atp_core --cov-report=html

# Run specific test files
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_elo_engine.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_prepare_feature.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_model.py -v
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/test_utils.py -v

Test Coverage

  • βœ… ELO rating system tests
  • βœ… Feature extraction and time decay tests
  • βœ… Model training and SHAP analysis tests
  • βœ… Utility function tests
  • βœ… Expected: >80% coverage on core modules

Note: PYTHONPATH=$PWD:$PYTHONPATH is required to allow tests to import atp_core and atp modules.


βš™οΈ Configuration

Environment Variables

# Paths
export DATA_DIR="data"
export MODEL_DIR="models"
export LOG_LEVEL="INFO"

# ELO
export ELO_K_FACTOR="20.0"
export ELO_PER_SURFACE="true"

# XGBoost
export XGBOOST_LEARNING_RATE="0.02"
export XGBOOST_MAX_DEPTH="7"
export USE_GPU="false"

# Features
export RECENT_N_MATCHES="30"

Configuration File

Modify default parameters in atp_core/config.py:

# Default configuration constants
DEFAULT_ELO_K = 20.0
DEFAULT_BASE_ELO = 1500
DEFAULT_RECENT_N = 30
DEFAULT_TIME_DECAY = 0.0

πŸ› Troubleshooting

Problem: "No CSV files found"

# Verify data directory
ls data/atp_matches_*.csv

# Download data
mkdir -p data
wget -P data https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2024.csv

Problem: "Model artifact not found"

# Train model first
python3 atp.py train --data-dir data

Problem: "Player not found"

# Use fuzzy matching (automatic)
python3 atp.py predict "Fedrer" "Nadal" Hard  # Finds "Federer"

# List available players
python3 atp.py list --top 100

Problem: GPU not detected

# Verify NVIDIA driver
nvidia-smi

# Check CUDA in XGBoost
python3 -c "import xgboost as xgb; print(xgb.build_info())"

# Reinstall XGBoost with CUDA support
pip uninstall xgboost
pip install xgboost --no-cache-dir

Problem: Import errors

# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

# For tests, use PYTHONPATH
PYTHONPATH=$PWD:$PYTHONPATH pytest tests/ -v

πŸ“š References

Dataset

Algorithms & Libraries


πŸ™ Acknowledgments

This project would not be possible without:

  • Jeff Sackmann (Tennis Abstract) β€” For maintaining the comprehensive ATP tennis dataset for over a decade. His dedication to open tennis data has enabled countless research projects and applications. Repository: JeffSackmann/tennis_atp

  • XGBoost Development Team β€” For creating and maintaining an exceptional gradient boosting library that powers the prediction engine

  • Scikit-learn Contributors β€” For the robust machine learning framework and tools used throughout this project

  • SHAP Development Team β€” For the interpretability tools that make model decisions transparent

  • Tennis Analytics Community β€” Including researchers and practitioners on Kaggle and elsewhere who have shared insights and best practices in sports prediction


🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.


πŸ“„ License & Citation

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) - see the LICENSE file for details.

Citation

If you use this project in research or academic work, please cite both the software and the ATP dataset.

See CITATION.md for detailed citation formats (BibTeX, APA, IEEE, Chicago).

Quick Citation:

ulpati. (2025). ATP Tennis Match Predictor: Machine learning system for match 
prediction (Version 1.0) [Computer software]. GitHub. 
https://github.com/ulpati/atp-tennis-predictor

Sackmann, J. (2025). ATP tennis rankings, results, and stats [Data set]. 
Tennis Abstract. https://github.com/JeffSackmann/tennis_atp

πŸ‘€ Author: ulpati

⭐ If you find this project useful, please consider starring the repository!

πŸ’¬ Questions? Open an issue at github.com/ulpati/atp-tennis-predictor/issues

About

ML system for ATP tennis prediction: 80-84% accuracy, XGBoost with ELO ratings, surface-specific models, SHAP analysis, GPU acceleration, temporal validation

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages