ConstrainedGrid2op

Constrained Policy Gradient algorithms for power grid management using Grid2Op.

This repository is based on MagicRL. It has been adapted and extended to focus on constrained RL for power grid operation via Grid2Op.

This project implements policy gradient methods -- including the constrained variant CPGPE -- applied to the Learning to Run a Power Network (L2RPN) environments. The agent learns to operate a power grid by selecting discrete topology actions (e.g., connecting/disconnecting lines), while optionally satisfying safety constraints on line loading, voltage deviations, and other operational metrics.

Installation

1. Create a Conda environment

Python 3.10+ is required.

conda create --name constrained_grid2op python=3.10
conda activate constrained_grid2op

2. Install dependencies

pip install -r requirements.txt

3. (Optional) Install Numba for faster Grid2Op simulations

pip install numba

4. Download a Grid2Op environment

The first time you use a Grid2Op scenario, it will be downloaded automatically. You can also pre-download it:

import grid2op
grid2op.make("l2rpn_case14_sandbox", test=True)

Quick Start

Run a short training session with CPGPE on the default Grid2Op environment:

python run.py --alg cpgpe --ite 10 --batch 20 --horizon 100 --var 0.1 --costs 1 --cost_type mv

Results are saved automatically in the experiments/ directory.

Running Experiments

All experiments are launched through run.py. Results (JSON logs and best policy parameters) are saved under the --dir directory, organized by experiment configuration and trial number.

Command-Line Parameters

General

Parameter	Type	Default	Description
`--dir`	str	`experiments/`	Directory where results are saved
`--ite`	int	`100`	Number of training iterations
`--batch`	int	`100`	Number of trajectories per iteration
`--horizon`	int	`100`	Episode length (max timesteps)
`--gamma`	float	`1`	Discount factor
`--n_trials`	int	`1`	Number of independent runs
`--n_workers`	int	`1`	Number of parallel workers for trajectory sampling
`--clip`	int	`1`	Whether to clip actions (`0` or `1`)

Algorithm

Parameter	Type	Default	Choices	Description
`--alg`	str	`pgpe`	`pgpe`, `pg`, `dpg`, `cpgpe`	Algorithm to use
`--var`	float	`1`	--	Exploration variance ((\sigma^2)) for parameter perturbation
`--lr`	float	`0.001`	--	Learning rate
`--lr_strategy`	str	`adam`	`adam`, `constant`	Learning rate schedule

Policy

Parameter	Type	Default	Choices	Description
`--pol`	str	`nn_softmax`	`linear`, `nn`, `big_nn`, `nn_softmax`	Policy architecture

nn_softmax (recommended for Grid2Op): Neural network with softmax output for discrete actions. Initialized with a bias towards the "do nothing" action (action 0) for stable learning.
nn: Neural network with continuous output.
big_nn: Larger neural network (4 hidden layers).
linear: Linear policy.

Grid2Op Environment

Parameter	Type	Default	Description
`--grid2op_env`	str	`l2rpn_case14_sandbox`	Grid2Op scenario name (see available environments)

Constrained RL (CPGPE)

Parameter	Type	Default	Choices	Description
`--costs`	int	`0`	`0`, `1`	Enable cost-aware environment (`1` = enabled)
`--cost_type`	str	`tc`	`tc`, `cvar`, `mv`, `chance`	Cost aggregation type

Cost types:

tc -- Trajectory cost: penalizes the cumulative cost along the trajectory.
mv -- Mean-variance: penalizes both expected cost and its variance (risk-sensitive).
cvar -- Conditional Value at Risk: penalizes the tail of the cost distribution.
chance -- Chance constraint: penalizes the probability of exceeding a threshold.

Example Commands

PGPE (unconstrained) on the sandbox environment:

python run.py --alg pgpe --pol nn_softmax --ite 100 --batch 50 --horizon 100 --var 0.1 --lr 0.001

CPGPE (constrained) with mean-variance cost:

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 100 --horizon 100 \
    --var 0.1 --lr 0.001 --costs 1 --cost_type mv

CPGPE on the WCCI 2020 environment (larger grid):

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 100 --horizon 200 \
    --var 0.1 --lr 0.001 --costs 1 --cost_type tc --grid2op_env l2rpn_wcci_2020

Multiple trials with parallelism:

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 50 --horizon 100 \
    --var 0.1 --costs 1 --cost_type mv --n_trials 5 --n_workers 4

Policies

Policy	Class	Suitable For
`nn_softmax`	`policies.NNSoftmax`	Discrete action spaces (Grid2Op). Outputs action probabilities via softmax.
`nn`	`policies.NeuralNetworkPolicy`	Continuous action spaces.
`big_nn`	`policies.NeuralNetworkPolicy`	Continuous action spaces (larger network).
`linear`	`policies.OldLinearPolicy`	Linear parametrization.

Action Bias Initialization

When using nn_softmax, the output layer bias is initialized to strongly prefer the "do nothing" action (action index 0). This is critical for Grid2Op, where unnecessary interventions often cause cascading failures. The bias is part of the learnable parameters, so the agent can learn to take other actions when beneficial.

Grid2Op Environments

The following L2RPN environments are supported (sorted by grid size):

Environment	Substations	Lines	Features
`l2rpn_case14_sandbox`	14	20	Smallest grid, no maintenance or opponent. Good for development.
`l2rpn_wcci_2020`	36	59	Maintenance events, redispatching. Medium difficulty.
`l2rpn_wcci_2022`	36	--	Newer version of the WCCI scenario.
`l2rpn_neurips_2020_track1_small`	36	--	Maintenance, opponent attacks, redispatching. Small dataset.
`l2rpn_neurips_2020_track1_large`	36	--	Same grid as above, larger dataset.
`l2rpn_neurips_2020_track2_small`	118	--	Largest grid. Small dataset.
`l2rpn_neurips_2020_track2_large`	118	--	Largest grid. Large dataset.

Note: "small/large" in the NeurIPS environments refers to the chronics dataset size, not the grid size.

Cost Functions for Constrained RL

When --costs 1 is passed, the environment computes per-step cost signals that the CPGPE algorithm uses to enforce constraints. The available cost functions are configured in the cost_config dictionary inside run.py:

Cost Function	Description
`rho_max`	Maximum line loading excess above threshold
`rho_violations_count`	Number of overloaded lines
`rho_violations_sum`	Sum of line loading above threshold (default)
`rho_violations_quadratic`	Quadratic penalty for loading violations
`disconnections`	Number of disconnected lines
`overflow_duration`	Total overflow timesteps across all lines
`voltage_deviation`	Maximum voltage deviation from acceptable bounds
`curtailment`	Total renewable energy curtailment (MW)
`redispatch`	Total absolute redispatch (MW)

Default configuration uses rho_violations_sum with a threshold of 0.8 (80% line loading). To customize, edit the cost_config in run.py:

"cost_config": {
    "costs": ["rho_violations_sum", "disconnections"],  # Multiple costs
    "rho_threshold": 0.8,
    "voltage_bounds": (0.95, 1.05),
    "weights": [1.0, 0.5]
}

For a detailed reference of Grid2Op observation attributes and cost metric definitions, see docs/grid2op_attributes.md.

Diagnostic Baselines

The test_baselines.py script evaluates simple baseline policies to establish performance bounds:

python test_baselines.py

This runs three policies and reports statistics:

Do Nothing: Always takes action 0 (no intervention). Typically the strongest baseline on simple scenarios.
Random: Uniformly random actions. Usually causes rapid grid failure.
Biased Random: Randomly selects actions with a strong preference for "do nothing".

The output includes mean/std of rewards, costs, episode lengths, and the percentage of episodes that survive the full horizon.

Documentation

docs/grid2op_attributes.md -- Comprehensive reference for Grid2Op observation attributes, cost metric formulas, and recommended configurations for CPGPE.
Grid2Op Documentation
L2RPN Competition

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__MACOSX		__MACOSX
adam		adam
algorithms		algorithms
data_processors		data_processors
docs		docs
envs		envs
policies		policies
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
test_baselines.py		test_baselines.py
test_grid2op.py		test_grid2op.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConstrainedGrid2op

Table of Contents

Installation

1. Create a Conda environment

2. Install dependencies

3. (Optional) Install Numba for faster Grid2Op simulations

4. Download a Grid2Op environment

Quick Start

Running Experiments

Command-Line Parameters

General

Algorithm

Policy

Grid2Op Environment

Constrained RL (CPGPE)

Example Commands

Policies

Action Bias Initialization

Grid2Op Environments

Cost Functions for Constrained RL

Diagnostic Baselines

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AI4REALNET/safe-constrained-policy-gradient

Folders and files

Latest commit

History

Repository files navigation

ConstrainedGrid2op

Table of Contents

Installation

1. Create a Conda environment

2. Install dependencies

3. (Optional) Install Numba for faster Grid2Op simulations

4. Download a Grid2Op environment

Quick Start

Running Experiments

Command-Line Parameters

General

Algorithm

Policy

Grid2Op Environment

Constrained RL (CPGPE)

Example Commands

Policies

Action Bias Initialization

Grid2Op Environments

Cost Functions for Constrained RL

Diagnostic Baselines

Documentation

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages