The power-grid-model is designed to handle calculations at scale.
It remains performant, even when doing calculations with one or a combination of the following extremes
(non-exhaustive):
- Large grids
- Batch calculations with many scenarios
- Many changes in the grid in each scenario
To achieve that high performance, several optimizations are made. To use those optimizations to the fullest, we recommend our users to follow the following guidelines.
This guide focuses on system-level performance optimization (batching, caching, parallelization).
For algorithm-level details such as calculation method selection, see the [Calculations](calculations.md) documentation.
Many of our optimizations assume input data validity and rely on the fact that the provided grid is reasonably close to realistic. Non-convergence, underdetermined equations or other unexpected behavior may therefore be encountered when the data is not realistic.
To keep the PGM performant, checks on hard physical bounds are offloaded to a separate tool, i.e., the data validator. However, these checks can be prohibitively expensive and application at scale in production environments is therefore not recommended when performance matters. Instead, we recommend using the data validator specifically for debugging purposes.
Some combinations of input data are not forbidden by physics, but still pose unrealistic conditions, e.g., a source with
a very low short-circuit power.
These cases may result in unexpected behavior of the calculation core.
Vagueness and case-dependence make it hard to check what can be considered "unrealistic", and the
[data validator](data-validator.md) will therefore not catch such cases.
We recommend our users to provide reasonably realistic scenarios to prevent these edge cases from happening.
The data format of input, output and update data can have a big effect on memory and computational cost.
Row-based data (created, e.g., using {py:class}power_grid_model.initialize_array in Python) constructs input/update
data with all attributes for a given dataset type.
However, many component attributes are optional.
If your use case does not depend on these attributes, a lot of data is needlessly created and initialized.
If you are running on a system where memory is the bottle-neck, using a columnar data format may reduce the memory
footprint.
This may or may not induce a slight computational overhead during calculations.
Some simulations might require a Cartesian product of scenarios of two batch datasets.
This can be done by passing them to update_data as a list
ie. a list[{py:class}BatchDataset <power_grid_model.data_types.BatchDataset>]
(Check Power Flow Example).
This gets treated as a Cartesian product of the provided datasets and the combination of scenarios gets handled
internally.
Hence there is no need to allocate memory for full N1 * N2 * ... scenarios for a Cartesian product of data sets with
scenario size N1, N2, ....
For more performance guidelines, see also below.
For most use cases, only certain output values are relevant.
For example, if you are only interested in line loading, outputting all other components and attributes results in
unnecessary overhead.
The output data may be a significant, if not the dominant, contributor to memory load, particularly when running large
batch calculations.
We therefore recommend restricting the output data to only the components and attributes that are used by the user in
such production environments.
In Python, it is possible to do so by using the output_component_types keyword argument in the calculate_* functions
(like {py:class}power_grid_model.PowerGridModel.calculate_power_flow)
Most databases store their data in a columnar data format. Copying, reserving unused memory, and cache misses can lead to unnecessary memory usage and computational overhead. With the introduction of columnar data input to PGM, integrating with databases using this format becomes easier, more natural, and more efficient.
Depending on the details of the batch, a number of performance optimizations are possible:
- Topology constructions, especially, may significantly impact the computation time of a scenario.
- The way the batch data set is provided to the model can also affect the performance.
Topology is an expensive step in calculations.
Fortunately, the topology can be cached when there are no structural changes to the power grid itself.
For the power-grid-model, this is the case when there no changes to statuses (from_status, to_status, status,
etc.) of the following components:
- Branches: Lines, Links, Transformers
- Branch3: Three winding transformer
- Appliances: Sources
In particular, the topology is cached in the following way:
- If none of the provided batch scenarios change the status of branches and sources, the model will re-use the pre-built internal graph/matrices for each calculation. Time-series load profile calculation is a typical use case.
- If some batch scenarios are changing the switching status of branches and sources, the topology changes and is thus reconstructed before and after each scenario that does so. N-1 check is a typical use case.
As such, the following rule-of-thumb holds:
Scenarios that change the same status attributes the same way should be fed to the power-grid-model together as much as
possible.
In practice, this means:
- In use cases that require many different parameter calculations for only a small set of different topologies, it is
- recommended to split the calculation in separate batches - one for each topology - to optimize performance.
- Otherwise, it is recommended to sort the scenarios by topology to minimize the amount of reconstructions.
In the Calculations documentation, the distinction is made between independent and dependent batches. Both types of batches allow for different performance optimizations. To ensure that the right choice is always made, the following rule-of-thumb may be used:
Sparsity of sampling should be reflected by sparsity in the batch update parameters and vice versa.
To elaborate:
- Dependent batches are useful for a sparse sampling for many different components, e.g. for N-1 checks.
- Independent batches are useful for a dense sampling of a small subset of components, e.g. time series power flow
- calculation.
Multi-dimensional batch calculations help reduce the update data volume, but they also can improve caching. It is recommended to put the variations that do not change topology in the innermost dimension. That is, if you want to run a combination of both N-1 and time-series calculations, you should put time-series in the inner dimension (last dataset in the list).
If the host system supports it, parallel computation is an easy way to gain performance. As mentioned in the Calculations, letting the power-grid-model determine the amount of threads is recommended.
Every iteration of power-flow or state estimation has a step of solving large number of sparse linear equations, i.e.
AX=b in matrix form.
Computation wise this is a very expensive step.
One major component of this step is factorization of the A matrix.
In certain calculation methods, this A matrix and its factorization remains unchanged over iterations and batches
(only specific cases).
This makes it possible to reuse the factorization, skip this step and improve performance.
Prefactorization over batches is possible when switching status or specified power values of load/generation or source
reference voltage is modified.
It is not possible when topology or grid parameters are modified, i.e. in switching of branches, shunt, sources or
change in transformer tap positions.