This guide provides comprehensive documentation for training and evaluating LSTM/GRU models for energy consumption forecasting using the train_lstm.py script.
LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are recurrent neural network architectures designed for sequence modelling. Unlike the more complex Temporal Fusion Transformer (TFT), these models offer:
- Simplicity: Easier to understand and debug
- Faster training: Typically trains faster than attention-based models
- Lower memory requirements: More efficient for large datasets
- Strong baseline performance: Often achieves competitive results with less complexity
The implementation includes:
- Configurable depth: 1-4 LSTM/GRU layers (default: 2)
- Hidden size: 64-512 units per layer (default: 128)
- Dropout: Regularization between layers (default: 0.2)
- Sequence-to-vector: Takes in a sequence and predicts future values
- Adam optimiser: With learning rate scheduling
- Early stopping: Prevents overfitting
The model uses the following engineered features:
Temporal Features:
- Hour of day (0-23)
- Day of week (0-6)
- Month (1-12)
- Day of month
- Quarter (1-4)
Lag Features:
- lag_1: Previous hour's value
- lag_24: Same hour yesterday
- lag_168: Same hour last week
Rolling Statistics:
- rolling_mean_24: 24-hour moving average
- rolling_std_24: 24-hour moving standard deviation
Train an LSTM model with default settings:
python scripts/train_lstm.py --mode train_test --epochs 50This will:
- Load and preprocess the data
- Create train/validation/test splits (70/15/15)
- Train the LSTM model for 50 epochs
- Evaluate on the test set
- Save the model checkpoint to
checkpoints/lstm_best_PJME_MW.pt - Generate training history and prediction plots in
figures/
GRU models are often faster and perform similarly to LSTM:
python scripts/train_lstm.py --mode train_test --use_gru --epochs 50Train with custom hyperparameters:
python scripts/train_lstm.py \
--mode train_test \
--epochs 100 \
--batch_size 128 \
--hidden_size 256 \
--num_layers 3 \
--dropout 0.3 \
--learning_rate 0.0005 \
--lookback 168 \
--forecast_horizon 24| Argument | Type | Default | Description |
|---|---|---|---|
--mode |
str | train_test |
Mode of operation: train, test, or train_test |
--data_path |
str | composite_energy_data.csv |
Path to the dataset |
--region |
str | PJME_MW |
Energy region to forecast |
--checkpoint_path |
str | None | Path to checkpoint for testing |
| Argument | Type | Default | Description |
|---|---|---|---|
--use_gru |
flag | False | Use GRU instead of LSTM |
--hidden_size |
int | 128 | Hidden size for LSTM/GRU layers |
--num_layers |
int | 2 | Number of recurrent layers |
--dropout |
float | 0.2 | Dropout rate between layers |
| Argument | Type | Default | Description |
|---|---|---|---|
--epochs |
int | 50 | Maximum number of training epochs |
--batch_size |
int | 64 | Batch size for training |
--learning_rate |
float | 0.001 | Initial learning rate |
--patience |
int | 10 | Early stopping patience (epochs) |
| Argument | Type | Default | Description |
|---|---|---|---|
--lookback |
int | 168 | Lookback window size (hours) |
--forecast_horizon |
int | 24 | Forecast horizon (hours) |
Train a model without testing:
python scripts/train_lstm.py --mode train --epochs 50Test a previously trained model:
python scripts/train_lstm.py \
--mode test \
--checkpoint_path checkpoints/lstm_best_PJME_MW.ptTrain models for different energy regions:
# AEP region
python scripts/train_lstm.py --region AEP_MW --epochs 50
# DAYTON region
python scripts/train_lstm.py --region DAYTON_MW --epochs 50Small, fast model (for quick experiments):
python scripts/train_lstm.py \
--hidden_size 64 \
--num_layers 1 \
--epochs 30Large, powerful model (for best performance):
python scripts/train_lstm.py \
--hidden_size 512 \
--num_layers 4 \
--dropout 0.3 \
--epochs 100 \
--batch_size 32Long-term forecasting:
python scripts/train_lstm.py \
--lookback 336 \
--forecast_horizon 72 \
--epochs 75================================================================================
Training LSTM Model
================================================================================
[1/5] Loading data from composite_energy_data.csv...
Loaded 145,366 rows for PJME_MW
[2/5] Engineering features...
Final dataset: 145,198 rows
[3/5] Creating datasets...
Train samples: 101,591
Val samples: 21,733
Test samples: 21,733
[4/5] Creating LSTM model...
Model created: 206,872 parameters
[5/5] Training model...
Epoch 1/50 | Train Loss: 0.2341 | Val Loss: 0.2156
Epoch 2/50 | Train Loss: 0.1987 | Val Loss: 0.1923
...
Training complete!
Best validation loss: 0.1542
Model saved to: checkpoints/lstm_best_PJME_MW.pt
Training history plot saved to: figures/lstm_training_history.png
================================================================================
Testing Model
================================================================================
Test Results:
MSE: 1234.56
RMSE: 35.14
MAE: 27.89
MAPE: 2.34%
Testing complete!
Predictions plot saved to: figures/lstm_predictions.png
Saved in checkpoints/:
lstm_best_{region}.ptorgru_best_{region}.pt- Contains model weights, optimiser state, scalers, and training history
Saved in figures/:
Training History (lstm_training_history.png or gru_training_history.png):
- Training and validation loss curves over epochs
- Helps diagnose overfitting or underfitting
Predictions (lstm_predictions.png or gru_predictions.png):
- Top panel: Time series of predictions vs actual (last 500 points)
- Bottom panel: Scatter plot of actual vs predicted with metrics
The saved checkpoint contains:
{
'model_state_dict': {...}, # Model weights
'optimiser_state_dict': {...}, # Optimiser state
'epoch': 42, # Final epoch number
'val_loss': 0.1542, # Best validation loss
'config': {...}, # All command-line arguments
'scaler_X': StandardScaler(), # Feature scaler
'scaler_y': StandardScaler(), # Target scaler
'train_losses': [...], # Training loss history
'val_losses': [...] # Validation loss history
}With default hyperparameters (50 epochs, 168-hour lookback):
- RMSE: ~800-1200 MW
- MAE: ~600-900 MW
- MAPE: ~2-4%
- Training time: 5-15 minutes (CPU), 1-3 minutes (GPU)
| Model | Complexity | Training Time | Typical RMSE |
|---|---|---|---|
| Linear Regression | Low | Seconds | ~1500 MW |
| XGBoost/LightGBM | Medium | Minutes | ~900 MW |
| LSTM/GRU | Medium-High | 5-15 min | ~800-1200 MW |
| TFT | High | 30-60 min | ~700-1000 MW |
Begin with a small model and short training:
python scripts/train_lstm.py --hidden_size 64 --num_layers 1 --epochs 10Watch the training/validation loss gap:
- Good: Train and val losses decrease together
- Overfitting: Train loss decreases, val loss increases
- Solution: Increase dropout, reduce model size, or stop early
- Try GRU first: Often performs similarly with faster training
- Use LSTM if: You have very long sequences or complex patterns
- Smaller batches (32-64): Better generalisation, slower training
- Larger batches (128-256): Faster training, may need higher learning rate
- Too high: Training unstable, loss oscillates
- Too low: Slow convergence
- Default (0.001): Good starting point for most cases
-
Lookback window:
- 168 hours (1 week): Captures weekly patterns
- 336 hours (2 weeks): Better for irregular patterns
- Trade-off: Longer = more context but slower training
-
Forecast horizon:
- 24 hours (1 day): Standard short-term forecasting
- 72 hours (3 days): Medium-term forecasting
- Longer horizons are generally harder to predict
Symptoms: Validation loss stays constant or decreases very slowly
Solutions:
- Increase learning rate:
--learning_rate 0.005 - Increase model capacity:
--hidden_size 256 --num_layers 3 - Check data preprocessing and feature engineering
Symptoms: Train loss much lower than validation loss
Solutions:
- Increase dropout:
--dropout 0.3or--dropout 0.4 - Reduce model size:
--hidden_size 64 --num_layers 1 - Reduce training epochs or use early stopping (automatic)
- Get more training data
Solutions:
- Increase batch size:
--batch_size 128 - Use GRU instead of LSTM:
--use_gru - Reduce sequence length:
--lookback 72 - Use GPU if available (automatic detection)
Solutions:
- Reduce batch size:
--batch_size 32 - Reduce model size:
--hidden_size 64 - Reduce sequence length:
--lookback 72
Train multiple models and average predictions:
# Train 3 models with different configurations
python scripts/train_lstm.py --hidden_size 128 --num_layers 2 --epochs 50
python scripts/train_lstm.py --hidden_size 256 --num_layers 2 --epochs 50
python scripts/train_lstm.py --use_gru --hidden_size 192 --num_layers 3 --epochs 50
# Load and average predictions in PythonTrain on one region, test on another:
# Requires custom script modificationThe generate_figures.py script can automatically generate visualisations from trained models:
python scripts/generate_figures.pyThis will create all analysis figures plus LSTM/GRU training and prediction plots if checkpoints exist.
For issues or questions:
- Check this guide for common solutions
- Review the TFT_GUIDE.md for general time-series tips
- Examine the generated figures for diagnostic information
- Open an issue on the project repository