Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5476776
Merge pull request #1 from Modern-Compilers-Lab/configurable-pretraining
BrouthenKamel Oct 14, 2024
9bea684
feat: adding tags to select data from full dataset
Mar 19, 2025
2fe899e
cleanup & feat: adding num of hits from the dataset
Mar 19, 2025
0266a60
feat: adding the pretraining of the GAT network method
Mar 19, 2025
7f2abed
cleanup: removing duplicated imports
Mar 19, 2025
49afcf1
cleanup: adding a generic config.yaml file
Mar 19, 2025
cbb487a
fix: fixing pretrained_model_path param
Apr 6, 2025
28d6676
feat: adding dropout and cleanup
Apr 6, 2025
c390c99
fix: fixing rollouts, changing get by wait to not wait for all worker…
Jun 17, 2025
c853a79
cleanup for ablation
Jun 17, 2025
e5da517
feat: adding README file and updating requirements
Jun 17, 2025
06f85f4
fix: making Tiramisu optional when only running the pretraining
Jun 18, 2025
8e2f93d
working on a slurm job script to streamlile future jobs
MusaToTheMoon Jun 25, 2025
0a1111a
finalized testing_job.sh; reusable script for future slurm jobs
MusaToTheMoon Jun 26, 2025
48eb0df
changed pretrain_GAT_network_with_schedules.py to work without needin…
MusaToTheMoon Jul 2, 2025
26cfdab
listed all modifications at the end of the file for doc purposes
MusaToTheMoon Jul 2, 2025
e191cb0
Merge branch 'ablation_study' of github.com:MusaToTheMoon/RL_Pretrain…
MusaToTheMoon Jul 2, 2025
07b302f
made ray import optional
MusaToTheMoon Jul 2, 2025
fcfccd4
Merge branch 'ablation_study' of github.com:MusaToTheMoon/RL_Pretrain…
MusaToTheMoon Jul 2, 2025
473f766
commented out some more imports not used unless a new dataset is created
MusaToTheMoon Jul 2, 2025
b88f243
Merge branch 'ablation_study' of github.com:MusaToTheMoon/RL_Pretrain…
MusaToTheMoon Jul 2, 2025
150bcd4
added some additional logging to pretrain_GAT_network_with_schedules.py
MusaToTheMoon Jul 2, 2025
646e601
remove large dataset .pkl file
MusaToTheMoon Jul 2, 2025
b84a2c5
resolve merge conflicts
MusaToTheMoon Jul 2, 2025
c2776ab
add time logging and some other changes to the pretrain GAT script
MusaToTheMoon Jul 3, 2025
84a0974
update pretraining scripts
MusaToTheMoon Jan 25, 2026
ffff259
updated neural networks, added GIN, GCN, global Pooling, SAG Pooling
MusaToTheMoon Jan 25, 2026
bae9d05
updated setup related files
MusaToTheMoon Jan 25, 2026
8e18ebf
add bash scripts to run pretrain; add bash scripts and wandb yaml con…
MusaToTheMoon Jan 25, 2026
86884b5
update .gitignore
MusaToTheMoon Jan 25, 2026
60f4cdd
adding results from sweep runs; not adding saved model weights becaus…
MusaToTheMoon Jan 25, 2026
5cfe9da
update sweep results, remove old sweep results
MusaToTheMoon Jan 25, 2026
011dc1d
update README
MusaToTheMoon Jan 25, 2026
2e97e21
Merge branch 'ablation_study' into ablation_study
MusaToTheMoon Mar 17, 2026
87f68d3
update readme
MusaToTheMoon Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ __pycache__/
experiment_dir
mlruns
dataset
.env

*.ipynb
# *.json
*.sh
# *.sh
*.pkl
141 changes: 18 additions & 123 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,132 +1,27 @@
# GAT Model Ablation Study
## Introduction

This repository contains the code and data for conducting an ablation study on the Graph Attention Network (GAT) model used in the Tiramisu RL project. The model architecture is implemented in `agent/policy_value_nn.py` and the pretraining script is in `pretrain_GAT_network_with_schedules.py`.
- This repo contains code to run experiments with different GNN models on the PEARL pretraining task.

## Model Architecture
- The neural_nets module implements GAT, GIN, and GCN. GIN and GCN are modular to allow switching pooling mechanisms. As for GAT, it's the original GAT implementation that was used in the paper. Pooling options are defined in pooling.py. Working implementations are present for Global Pooling and SAG Pooling.

The GAT model consists of the following key components:
- Pretraining scripts are in the repo root, named pretrain_[MODEL_TYPE]_network.py. These scripts are largely duplicates with minor training differences per architecture. Most up to date:
- GCN: pretrain_GCN_network.py
- GIN: pretrain_GCN_sag_network.py, pretrain_GIN_network.py
- GAT: pretrain_GAT_network_with_dataloader.py

1. **Input Layer**: Takes graph-structured data with input size of 260 features
2. **GAT Layers**:
- Two GATv2Conv layers with configurable hidden size and number of attention heads
- Each GAT layer is followed by a linear layer and SELU activation
- Uses both mean and max pooling for graph-level features
3. **Policy Head (π)**:
- Three-layer MLP with SELU activations
- Outputs action probabilities
4. **Value Head (v)**:
- Three-layer MLP with SELU activations
- Outputs state value estimates
- Bash scripts for running pretraining and W&B sweeps are in the playground/ subdirectory. W&B is used for hyperparameter tuning sweeps. MLflow is used for individual runs. This can be changed in the pretraining scripts as needed.

Key hyperparameters:
- `input_size`: 260 (default)
- `hidden_size`: 64 (default)
- `num_heads`: 4 (default)
- `num_outputs`: 32 (default)
- `dropout_prob`: 0.1 (default)
- As currently configured:
- results/ stores .out and .err from HPC runs
- sweep_results/ stores .out and .err from W&B sweeps on the HPC
- saved_weights/ stores model weights
- wandb/ contains run metadata for wandb sweeps
- mlruns/ contains run metadata for runs not using W&B

## Dataset
This mostly covers what’s needed to add new architectures, update or create pretraining scripts, and run hyperparameter sweeps. README_OLD.md contains some general information related to the repo, but some of the information in it might be outdated and inaccurate.

The pretraining dataset is stored in `pretrain_dataset_12.5k_fixed_duplicates.pkl`. This dataset contains 12,500 samples of function (with their schedules which gives more then 12.5K ) for training the GAT model.

## Guidelines

## Configuration File

The `config/config.yaml` file contains several important parameters that can be modified for the ablation study. Here are the key sections and parameters to consider:

### Tiramisu and Environment Configuration
```yaml
tiramisu:
tiramisu_path: "/path/to/your/tiramisu" # Update with your Tiramisu installation path
workspace: "/path/to/your/workspace_rollouts/" # Directory for Tiramisu binary files
experiment_dir: "/path/to/your/experiment_dir/" # Directory for experiment results
logs_dir: "/path/to/your/logs/" # Directory for logs

env_vars:
CXX: "/path/to/your/g++" # Path to your g++ compiler
TIRAMISU_ROOT: "/path/to/your/tiramisu" # Should match tiramisu_path
CONDA_ENV: "/path/to/your/conda/env" # Path to your conda environment
LD_LIBRARY_PATH: "LD_LIBRARY_PATH=/path/to/your/lib:${TIRAMISU_ROOT}/3rdParty/Halide/install/lib64:${TIRAMISU_ROOT}/3rdParty/llvm/build/lib:${TIRAMISU_ROOT}/3rdParty/isl/build/lib:$LD_LIBRARY_PATH"
```

### Dataset Configuration
```yaml
dataset:
dataset_format: PICKLE
cpps_path: /path/to/your/dataset/cpps_12.5k.pkl # Path to your CPPs file
dataset_path: /path/to/your/dataset/schedules_full_v2_100k.pkl # Path to your schedules file
pretrained_model_path: /path/to/your/pretrained_model.pt # Path to your pretrained model
save_path: /path/to/your/experiment_dir/save/ # Directory for updated dataset
models_save_path: /path/to/your/experiment_dir/models/ # Directory for saved models
results_save_path: /path/to/your/experiment_dir/results/ # Directory for results
evaluation_save_path: /path/to/your/experiment_dir/evaluation/ # Directory for evaluation results
shuffle: True
seed: 133
saving_frequency: 1000
is_benchmark: False
```

### Pretraining Configuration
```yaml
pretrain:
embed_access_matrices: True
embedding_type: concat_final_hidden_cell_state # Options:
# - final_hidden_state
# - final_cell_state
# - concat_final_hidden_cell_state
# - mean_pooling_output
# - max_pooling_output
# - flattened_output
```

### Training Hyperparameters
```yaml
hyperparameters:
num_updates: 15000
batch_size: 512
mini_batch_size: 64
num_epochs: 4
clip_epsilon: 0.3
gamma: 0.99
lambdaa: 0.95
value_coeff: 2
entropy_coeff_start: 0.1
entropy_coeff_finish: 0
max_grad_norm: 10
lr: 0.0001
start_lr: 0.0001
final_lr: 0.0001
weight_decay: 0.0001
```

## Getting Started

1. Install Tiramisu Compiler using [this guide](https://docs.google.com/document/d/1fCnPNd37BByYpAAw5c0Y5Mnswcn9wSKqt2nvb8oBwZY/edit?tab=t.0) (This can be skipped if onlt the pretraining with existing data is performed)

2. Set up a new conda environment and install the required packages:

```bash
# Create and activate a new conda environment
conda create -n gat-study python=3.9
conda activate gat-study

# Install PyTorch with CUDA support
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install other required packages
pip install -r requirements.txt
```

The requirements.txt file includes:
- torch and related packages for deep learning
- torch-geometric for graph neural networks
- numpy and scipy for numerical computations
- pandas for data manipulation
- scikit-learn for machine learning utilities
- matplotlib for visualization
- tqdm for progress bars
- pytorch-lightning for training utilities
- mlflow for experiment tracking
## License

This project is released under the same license as the original Modern-Compilers-Lab/GNN_RL_Pretrain repository.

Modifications in this fork are provided under the same terms.
136 changes: 136 additions & 0 deletions README_OLD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
Comment on this readme: this readme file is old, largely irrelevant to this repo. Checkout the newer README.md for more accurate information on how to replicate the experiments int this repo.

Date comment added: Sun Jan 25th

# GAT Model Ablation Study

This repository contains the code and data for conducting an ablation study on the Graph Attention Network (GAT) model used in the Tiramisu RL project. The model architecture is implemented in `agent/policy_value_nn.py` and the pretraining script is in `pretrain_GAT_network_with_schedules.py`.

## Model Architecture

The GAT model consists of the following key components:

1. **Input Layer**: Takes graph-structured data with input size of 260 features
2. **GAT Layers**:
- Two GATv2Conv layers with configurable hidden size and number of attention heads
- Each GAT layer is followed by a linear layer and SELU activation
- Uses both mean and max pooling for graph-level features
3. **Policy Head (π)**:
- Three-layer MLP with SELU activations
- Outputs action probabilities
4. **Value Head (v)**:
- Three-layer MLP with SELU activations
- Outputs state value estimates

Key hyperparameters:
- `input_size`: 260 (default)
- `hidden_size`: 64 (default)
- `num_heads`: 4 (default)
- `num_outputs`: 32 (default)
- `dropout_prob`: 0.1 (default)

## Dataset

The pretraining dataset is stored in `pretrain_dataset_12.5k_fixed_duplicates.pkl`. This dataset contains 12,500 samples of function (with their schedules which gives more then 12.5K ) for training the GAT model.

## Guidelines

## Configuration File

The `config/config.yaml` file contains several important parameters that can be modified for the ablation study. Here are the key sections and parameters to consider:

### Tiramisu and Environment Configuration
```yaml
tiramisu:
tiramisu_path: "/path/to/your/tiramisu" # Update with your Tiramisu installation path
workspace: "/path/to/your/workspace_rollouts/" # Directory for Tiramisu binary files
experiment_dir: "/path/to/your/experiment_dir/" # Directory for experiment results
logs_dir: "/path/to/your/logs/" # Directory for logs

env_vars:
CXX: "/path/to/your/g++" # Path to your g++ compiler
TIRAMISU_ROOT: "/path/to/your/tiramisu" # Should match tiramisu_path
CONDA_ENV: "/path/to/your/conda/env" # Path to your conda environment
LD_LIBRARY_PATH: "LD_LIBRARY_PATH=/path/to/your/lib:${TIRAMISU_ROOT}/3rdParty/Halide/install/lib64:${TIRAMISU_ROOT}/3rdParty/llvm/build/lib:${TIRAMISU_ROOT}/3rdParty/isl/build/lib:$LD_LIBRARY_PATH"
```

### Dataset Configuration
```yaml
dataset:
dataset_format: PICKLE
cpps_path: /path/to/your/dataset/cpps_12.5k.pkl # Path to your CPPs file
dataset_path: /path/to/your/dataset/schedules_full_v2_100k.pkl # Path to your schedules file
pretrained_model_path: /path/to/your/pretrained_model.pt # Path to your pretrained model
save_path: /path/to/your/experiment_dir/save/ # Directory for updated dataset
models_save_path: /path/to/your/experiment_dir/models/ # Directory for saved models
results_save_path: /path/to/your/experiment_dir/results/ # Directory for results
evaluation_save_path: /path/to/your/experiment_dir/evaluation/ # Directory for evaluation results
shuffle: True
seed: 133
saving_frequency: 1000
is_benchmark: False
```

### Pretraining Configuration
```yaml
pretrain:
embed_access_matrices: True
embedding_type: concat_final_hidden_cell_state # Options:
# - final_hidden_state
# - final_cell_state
# - concat_final_hidden_cell_state
# - mean_pooling_output
# - max_pooling_output
# - flattened_output
```

### Training Hyperparameters
```yaml
hyperparameters:
num_updates: 15000
batch_size: 512
mini_batch_size: 64
num_epochs: 4
clip_epsilon: 0.3
gamma: 0.99
lambdaa: 0.95
value_coeff: 2
entropy_coeff_start: 0.1
entropy_coeff_finish: 0
max_grad_norm: 10
lr: 0.0001
start_lr: 0.0001
final_lr: 0.0001
weight_decay: 0.0001
```

## Getting Started

1. Install Tiramisu Compiler using [this guide](https://docs.google.com/document/d/1fCnPNd37BByYpAAw5c0Y5Mnswcn9wSKqt2nvb8oBwZY/edit?tab=t.0) (This can be skipped if onlt the pretraining with existing data is performed)

2. Set up a new conda environment and install the required packages:

```bash
# Create and activate a new conda environment
conda create -n gat-study python=3.9
conda activate gat-study

# Install PyTorch with CUDA support
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install other required packages
pip install -r requirements.txt
```

The requirements.txt file includes:
- torch and related packages for deep learning
- torch-geometric for graph neural networks
- numpy and scipy for numerical computations
- pandas for data manipulation
- scikit-learn for machine learning utilities
- matplotlib for visualization
- tqdm for progress bars
- pytorch-lightning for training utilities
- mlflow for experiment tracking


29 changes: 29 additions & 0 deletions neural_nets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""
This package contains different implementations of neural networks for the Agent in the PEARL RL-based autoschedular.

The models are compared on the pretraining task of predicting execution times.

Models:
- GAT: this is the original Graph Attention Network implementation
- GCN: Graph Convolutional Network implementation
- GIN: Graph Isomorphism Network implementation

Pooling layers used:
- GlobalPooling: global mean/max pooling
- SAGPoolLayer: Self-Attention Graph Pooling
- DiffPoolLayer: Differentiable Pooling -- NEEDS TO BE DEBUGGED

Both models are designed for the same task:
- Input: Program dependency graphs with 175-dimensional node features
- Output: Policy (action probabilities) and value (execution time estimate)
- Parameter count: ~207k-217k parameters for fair comparison
"""

__version__ = "1.0.0"
__author__ = "RL Pretrain Team"

from .gat_model import GAT
from .gcn_model import GCN
from .gin_model import GIN

__all__ = ["GAT", "GCN", "GIN"]
Loading