Skip to content

AI4REALNET/safe-constrained-policy-gradient

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ConstrainedGrid2op

Constrained Policy Gradient algorithms for power grid management using Grid2Op.

This repository is based on MagicRL. It has been adapted and extended to focus on constrained RL for power grid operation via Grid2Op.

This project implements policy gradient methods -- including the constrained variant CPGPE -- applied to the Learning to Run a Power Network (L2RPN) environments. The agent learns to operate a power grid by selecting discrete topology actions (e.g., connecting/disconnecting lines), while optionally satisfying safety constraints on line loading, voltage deviations, and other operational metrics.

Table of Contents


Installation

1. Create a Conda environment

Python 3.10+ is required.

conda create --name constrained_grid2op python=3.10
conda activate constrained_grid2op

2. Install dependencies

pip install -r requirements.txt

3. (Optional) Install Numba for faster Grid2Op simulations

pip install numba

4. Download a Grid2Op environment

The first time you use a Grid2Op scenario, it will be downloaded automatically. You can also pre-download it:

import grid2op
grid2op.make("l2rpn_case14_sandbox", test=True)

Quick Start

Run a short training session with CPGPE on the default Grid2Op environment:

python run.py --alg cpgpe --ite 10 --batch 20 --horizon 100 --var 0.1 --costs 1 --cost_type mv

Results are saved automatically in the experiments/ directory.


Running Experiments

All experiments are launched through run.py. Results (JSON logs and best policy parameters) are saved under the --dir directory, organized by experiment configuration and trial number.

Command-Line Parameters

General

Parameter Type Default Description
--dir str experiments/ Directory where results are saved
--ite int 100 Number of training iterations
--batch int 100 Number of trajectories per iteration
--horizon int 100 Episode length (max timesteps)
--gamma float 1 Discount factor
--n_trials int 1 Number of independent runs
--n_workers int 1 Number of parallel workers for trajectory sampling
--clip int 1 Whether to clip actions (0 or 1)

Algorithm

Parameter Type Default Choices Description
--alg str pgpe pgpe, pg, dpg, cpgpe Algorithm to use
--var float 1 -- Exploration variance ((\sigma^2)) for parameter perturbation
--lr float 0.001 -- Learning rate
--lr_strategy str adam adam, constant Learning rate schedule

Policy

Parameter Type Default Choices Description
--pol str nn_softmax linear, nn, big_nn, nn_softmax Policy architecture
  • nn_softmax (recommended for Grid2Op): Neural network with softmax output for discrete actions. Initialized with a bias towards the "do nothing" action (action 0) for stable learning.
  • nn: Neural network with continuous output.
  • big_nn: Larger neural network (4 hidden layers).
  • linear: Linear policy.

Grid2Op Environment

Parameter Type Default Description
--grid2op_env str l2rpn_case14_sandbox Grid2Op scenario name (see available environments)

Constrained RL (CPGPE)

Parameter Type Default Choices Description
--costs int 0 0, 1 Enable cost-aware environment (1 = enabled)
--cost_type str tc tc, cvar, mv, chance Cost aggregation type

Cost types:

  • tc -- Trajectory cost: penalizes the cumulative cost along the trajectory.
  • mv -- Mean-variance: penalizes both expected cost and its variance (risk-sensitive).
  • cvar -- Conditional Value at Risk: penalizes the tail of the cost distribution.
  • chance -- Chance constraint: penalizes the probability of exceeding a threshold.

Example Commands

PGPE (unconstrained) on the sandbox environment:

python run.py --alg pgpe --pol nn_softmax --ite 100 --batch 50 --horizon 100 --var 0.1 --lr 0.001

CPGPE (constrained) with mean-variance cost:

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 100 --horizon 100 \
    --var 0.1 --lr 0.001 --costs 1 --cost_type mv

CPGPE on the WCCI 2020 environment (larger grid):

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 100 --horizon 200 \
    --var 0.1 --lr 0.001 --costs 1 --cost_type tc --grid2op_env l2rpn_wcci_2020

Multiple trials with parallelism:

python run.py --alg cpgpe --pol nn_softmax --ite 100 --batch 50 --horizon 100 \
    --var 0.1 --costs 1 --cost_type mv --n_trials 5 --n_workers 4

Policies

Policy Class Suitable For
nn_softmax policies.NNSoftmax Discrete action spaces (Grid2Op). Outputs action probabilities via softmax.
nn policies.NeuralNetworkPolicy Continuous action spaces.
big_nn policies.NeuralNetworkPolicy Continuous action spaces (larger network).
linear policies.OldLinearPolicy Linear parametrization.

Action Bias Initialization

When using nn_softmax, the output layer bias is initialized to strongly prefer the "do nothing" action (action index 0). This is critical for Grid2Op, where unnecessary interventions often cause cascading failures. The bias is part of the learnable parameters, so the agent can learn to take other actions when beneficial.


Grid2Op Environments

The following L2RPN environments are supported (sorted by grid size):

Environment Substations Lines Features
l2rpn_case14_sandbox 14 20 Smallest grid, no maintenance or opponent. Good for development.
l2rpn_wcci_2020 36 59 Maintenance events, redispatching. Medium difficulty.
l2rpn_wcci_2022 36 -- Newer version of the WCCI scenario.
l2rpn_neurips_2020_track1_small 36 -- Maintenance, opponent attacks, redispatching. Small dataset.
l2rpn_neurips_2020_track1_large 36 -- Same grid as above, larger dataset.
l2rpn_neurips_2020_track2_small 118 -- Largest grid. Small dataset.
l2rpn_neurips_2020_track2_large 118 -- Largest grid. Large dataset.

Note: "small/large" in the NeurIPS environments refers to the chronics dataset size, not the grid size.


Cost Functions for Constrained RL

When --costs 1 is passed, the environment computes per-step cost signals that the CPGPE algorithm uses to enforce constraints. The available cost functions are configured in the cost_config dictionary inside run.py:

Cost Function Description
rho_max Maximum line loading excess above threshold
rho_violations_count Number of overloaded lines
rho_violations_sum Sum of line loading above threshold (default)
rho_violations_quadratic Quadratic penalty for loading violations
disconnections Number of disconnected lines
overflow_duration Total overflow timesteps across all lines
voltage_deviation Maximum voltage deviation from acceptable bounds
curtailment Total renewable energy curtailment (MW)
redispatch Total absolute redispatch (MW)

Default configuration uses rho_violations_sum with a threshold of 0.8 (80% line loading). To customize, edit the cost_config in run.py:

"cost_config": {
    "costs": ["rho_violations_sum", "disconnections"],  # Multiple costs
    "rho_threshold": 0.8,
    "voltage_bounds": (0.95, 1.05),
    "weights": [1.0, 0.5]
}

For a detailed reference of Grid2Op observation attributes and cost metric definitions, see docs/grid2op_attributes.md.


Diagnostic Baselines

The test_baselines.py script evaluates simple baseline policies to establish performance bounds:

python test_baselines.py

This runs three policies and reports statistics:

  • Do Nothing: Always takes action 0 (no intervention). Typically the strongest baseline on simple scenarios.
  • Random: Uniformly random actions. Usually causes rapid grid failure.
  • Biased Random: Randomly selects actions with a strong preference for "do nothing".

The output includes mean/std of rewards, costs, episode lengths, and the percentage of episodes that survive the full horizon.


Documentation


About

No description, website, or topics provided.

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages