Skip to content

Python hacks for industrial ecology

Stefan Pauliuk edited this page Sep 3, 2019 · 12 revisions

Here is a list of productivity and accuracy-boosting Python commands and functions for various types of IE research. Please add your own!

General commands

Element-wise division of two arrays, where the result is automatically set to 0 if the divisor has a 0 as an entry.

The equation C = A/B, where A, B, and C are n-dimensional arrays of the same shape, / denotes the element-wise division, and the constraint that for all zero entries of B a zero should be returned works as follows:

C = np.divide(A, B, out=np.zeros_like(B), where=B!=0)

np.einsum - Element-wise multiplication, summation, and reordering of dimensions. White magic!

The numpy implementation of the Einstein sum convention is a very powerful and versatile function that can save a lot of time and hassle. It works for any number of dimensions and can accept and return any order of indices. Example: Suppose you have an in-use stock of passenger vehicles S broken down by year t, age-cohort c, region r, and vehicle segment: S = S(t,c,r,s). You want to calculate the total energy consumption of operating the vehicle fleet E with a full breakdown on time, region, and segment but not for age-cohorts (E=E(t,r,s)), and your additional information is the annual kilometrage K as function of time and region (K = K(t,r)), and the age-cohort and segment-specific energy consumption e = e(c,s). The mathematical equations is $$ E(t,r,s) = Sum_c S(t,c,r,s)*K(t,r)*e(c,s) $$ Now, program this in Excel! Using np.einsum in Python, the result is just

E = np.einsum('trcs,tr,cs->trs',S,K,e)

You want an age-cohort but no segment breakdown E_a(t,c,r)? No problem:

E_a = np.einsum('trcs,tr,cs->tcr',S,K,e)

Note how the summation string was changed. You want the result to have the shape rtc instead? No problem:

E_a = np.einsum('trcs,tr,cs->rtc',S,K,e)

Etc. etc. np.einsum is "white magic" because it automagically performs a number of intricate steps, including roll dim, shift dim, transpose, and it is quite easy to use as a mis-specification of the summation string will quickly result in an error.

More tips for using np.einsum can be found here: https://stackoverflow.com/questions/26089893/understanding-numpys-einsum

A set of MFA-related examples can be found here: https://github.com/IndEcol/ODYM

An IO-related example can be found here: http://www.teaching.industrialecology.uni-freiburg.de/Content/IEooc_Application3_Software1.ipynb

np.reshape - convert 2D tables (matrices) with hierarchical multi-indizes to n-D arrays and vice versa. Black magic!

Let's say you have an MRIO matrix Z_2D of inter-industry flows (Z-matrix) in the typical setting: rows and columns both have an hierarchical index with the regions as the outer and the industries as the inner index. With Nr = 30 regions and Ni = 100 sectors in each, the shape of the matrix is 30x100 X 30x100, which is 3000 X 3000. Now, we want to aggregate this matrix from 30 to 12 regions and from 100 to 85 products. The easiest way to do that is to multiply each dimension with an aggregation matrix (projection matrix in mathematical terms), but how to do that if the dimension is part of the hierarchical or multi-index? The answer is: reshape first! In our example, we have:

Z_4D = np.reshape(Z_2D,(Nr,Ni,Nr,Ni))

This function call works because Nr x Ni equals the number of rows and columns of the matrix, and it is programmed to reshape so that the multi-index order equals the sequence of the dimension sizes (here: Nr and Ni)).

np.reshape is "black magic" because the reshape function works across a number of different dimensions and as long as the products of the old dimension size fits the new dimension size, it will not complain. The user carries great responsibility for applying this function correctly! It is very easy to get a multi-index hierarchy wrong and receive nonsense results without a complaint or noticing it right away. Programming of test cases is recommended.

The array Z_4D can now be aggregated for the regional and industry dimension, yielding Z_4D_agg with shape (12,85,12,85). We can now reshape it back to matrix form with hierarchical index to build and A-matrix of technical coefficients. The command for that is:

Z_2D_agg = np.reshape(Z_4D_agg,(12*85,12*85))

An IO-related example and more explanatoins can be found here: http://www.teaching.industrialecology.uni-freiburg.de/Content/IEooc_Application3_Software1.ipynb

Combining np.einsum and np.reshape:

Both functions can be combined, for example, when aggregating MRIO tables:

a) Reshape MRIO table from 2D with multi-level row and column indices to 4D array with separate (region x industry) x (region x industry) indices.

b) Aggregate regions and industries by multiplying each dimension with is respective aggregation matrix, using np.einsum.

c) Reshape back from 4D to 2D, now with reduced size, and carry on with matrix inversion for the Leontief model.

An IO-related example can be found here: http://www.teaching.industrialecology.uni-freiburg.de/Content/IEooc_Application3_Software1.ipynb

Handling large multi-dimensional datasets with pandas dataframes:

Python pandas dataframe manipulation can be a hassle to learn, and whether it is preferable over numpy for performing calculations is debatable. But pandas does have several unique features, some of which come in very handy when you need to organise data.

Using pandas to generate hierarchical indices

Supposed you want to export a 7987x7987 EXIOBASE Z matrix with 49 regions x 163 products row and column index to a csv file. Instead of double-for-looping to generate the index labels, try out the following:

# Given: EXIOBASE_Z as Z matrix as np.array, RegionList as list of 49 region labels, and SectorList as list of 163 sector labels.

# Create multiindex for rows:

RowIndex = pd.MultiIndex.from_product([RegionList,SectorList], names=('region_production', 'product'))

ColIndex = pd.MultiIndex.from_product([RegionList,SectorList], names=('region_consumption', 'sector'))

# Create dataframe:

EXIOBASE_Z_DF = pd.DataFrame(EXIOBASE_Z, index=RowIndex, columns=ColIndex) # in MEUR

# Store dateframe in csv format

EXIOBASE_Z_DF.to_csv('C:\\Users\\MyPath\\...\\EXIOBASE_Z.csv', sep=',')

The multi-index creation method can be used for other purposes as well.

Clone this wiki locally