GPU Thermal Regulation and Prediction

A machine learning and physics-informed system designed to monitor, predict, and regulate GPU temperature for efficient thermal management, ensuring optimal performance and preventing throttling or hardware damage.

📖 Overview

This project implements a Physics-Informed Neural Network (PINN) to model and predict GPU thermal behavior using real-time telemetry such as power draw, fan speed, utilization, and clock frequency.
By combining thermodynamic equations with recurrent neural networks (LSTM), the system learns temperature evolution patterns and dynamically estimates physical parameters such as heat transfer coefficient and thermal capacitance.
The ultimate goal is to enable energy-efficient cooling control in high-performance GPUs used in data centers, gaming, and compute-intensive workloads.

✨ Features

Real-Time Monitoring: Continuously tracks GPU telemetry — temperature, fan speed, power draw, and utilization.
Physics-Guided Learning: Integrates thermodynamic constraints into the neural network’s loss function for physically consistent predictions.
Future Temperature Prediction: Predicts GPU temperature 1 second ahead to anticipate and prevent overheating.
Custom Fan Control (Planned): Automatically regulates fan or pump speed to maintain target temperature with minimal energy waste.
Data Logging: Stores telemetry and prediction logs in structured CSV or database format for further analysis.
Visualization Tools: Provides real-time plots of temperature trends, model predictions, and error metrics.

🔧 Installation

Prerequisites

Python 3.8 or higher
NVIDIA GPU drivers (if applicable)
CUDA Toolkit (for GPU acceleration)
TensorFlow / Keras
NumPy, Pandas, Matplotlib

Setup

Clone the repository: git clone https://github.com/vtyagi26/Gpu-temperature-control-prediction.git cd Gpu-temperature-control-prediction
Create a virtual environment (recommended): python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies: pip install -r requirements.txt

🚀 Usage

1. Run the Model Training

Train the Physics-Informed Neural Network (PINN) on synthetic or collected telemetry data: python model.py

This will:

Load the dataset from the configured path.
Train the model with the physics-based loss function.
Save the trained model as gpu_temp_model.h5 for later use.

2. Visualize Predictions

Use the notebook or script to visualize predicted vs actual GPU temperature: python plot_results.py

The script will generate real-time comparison graphs of model output and true values.

3. Monitor GPU Temperature (Optional)

To only log GPU parameters in real-time: python monitor.py --interval 1

This command captures telemetry data every second using nvidia-smi or equivalent APIs.

📊 Model Details

Architecture: LSTM-based recurrent neural network with dense post-processing layers.
Loss Function: Combined Mean Squared Error (MSE) + Physics regularization term enforcing the RC thermal equation.
Target Variable: GPU core temperature (°C)
Input Features: Power, Fan Speed, Utilization, Clock Speed, Ambient Temp.
Performance: Achieved Mean Absolute Error (MAE) = 0.45°C, R² = 0.986 on synthetic GPU workload simulations.

🧠 Research Focus

This work lays the foundation for thermal-aware GPU control systems integrating physics with machine learning for energy-efficient operation.
Future extensions include:

Real-time dynamic control of fan/pump speed.
Integration with reinforcement learning for adaptive cooling.
Application to data center-scale GPU clusters.

📄 License

This project is licensed under the MIT License.

👨‍💻 Author

Vaibhav Tyagi

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
README.md		README.md
data_pre.py		data_pre.py
datalog.py		datalog.py
gpu_sequences.npz		gpu_sequences.npz
gpu_telemetry_sim.csv		gpu_telemetry_sim.csv
gpu_temp_predictor_model.keras		gpu_temp_predictor_model.keras
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Thermal Regulation and Prediction

📖 Overview

✨ Features

🔧 Installation

Prerequisites

Setup

🚀 Usage

1. Run the Model Training

2. Visualize Predictions

3. Monitor GPU Temperature (Optional)

📊 Model Details

🧠 Research Focus

📄 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Thermal Regulation and Prediction

📖 Overview

✨ Features

🔧 Installation

Prerequisites

Setup

🚀 Usage

1. Run the Model Training

2. Visualize Predictions

3. Monitor GPU Temperature (Optional)

📊 Model Details

🧠 Research Focus

📄 License

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages