|
| 1 | +# NumPy: numeric arrays and calculations |
| 2 | + |
| 3 | +## NumPy Package |
| 4 | + |
| 5 | +The [NumPy](http://numpy.scipy.org/) module is the basic toolset for Python enabling advanced mathematical calculations, especially for scientific applications (so-called _numerical calculations_, such as matrix multiplication and addition, diagonalization and inversion, integration, equation solving, etc.). It gives us specialized data types, operations and functions that are not available in a typical Python installation. Another module, [Scipy](http://scipy.org/) allows access to more complex and diverse scientific algorithms using the tools provided by NumPy. |
| 6 | + |
| 7 | +We will only provide an introduction to NumPy here. This is because describing the numerous features available in the NumPy library is a huge piece of work that makes no sense at all: you might as well have a look directly at its original documentation at <http://docs.scipy.org/doc/numpy/reference/>. |
| 8 | + |
| 9 | +The most important variable type, which NumPy and other packages that use it rely is the `ndarray` class, often referred to as `array`. We can treat `array` objects as universal containers for data in the form of _matrices_ (i.e. _vectors_ or _arrays_). Compared to the standard types of Python sequences (`list`, `tuple`), there are a few differences in handling these objects: |
| 10 | + |
| 11 | +1. the objects stored in the `array` must all be of the same type; |
| 12 | +2. `array` objects keep their size; when such an object is resized, a new object is created and the original object is deleted; |
| 13 | +3. `array` objects are equipped with a rich set of functions operating on all data stored in the object, specially optimized for processing large amounts of data. How it works will be presented below. |
| 14 | + |
| 15 | + |
| 16 | +## Creating arrays |
| 17 | + |
| 18 | +The easiest way to create a NumPy array is to call the function `array` with a list of numbers as an argument. If, instead of a list of numbers, we use a list containing other lists (so-called _nested list_), we will get a multidimensional array. For example, if the lists are double-nested, we get a two-dimensional array (_matrix_). |
| 19 | + |
| 20 | +For example: |
| 21 | + |
| 22 | +```python |
| 23 | +import numpy as np # np is a popular alias for numpy |
| 24 | +A = np.array([ 1, 3, 7, 2, 8]) |
| 25 | +B = np.array([[1, 2, 3], [4, 5, 6]]) |
| 26 | +print(A, end='\n\n') |
| 27 | +print(B, end='\n\n') |
| 28 | +print(B.transpose()) |
| 29 | +``` |
| 30 | + |
| 31 | +The result will be: |
| 32 | + |
| 33 | +``` |
| 34 | +[1 3 7 2 8] |
| 35 | +
|
| 36 | +[[1 2 3] |
| 37 | + [4 5 6]] |
| 38 | +
|
| 39 | +[[1 4] |
| 40 | + [2 5] |
| 41 | + [3 6]] |
| 42 | +``` |
| 43 | + |
| 44 | +Another way to create an array is with a function `arange`, which works like a `range` function except that it returns the NumPy array instead of a list, and allows fractional parameters — not just integers. |
| 45 | + |
| 46 | +The arguments are the same as for the `range` function: |
| 47 | + |
| 48 | +1. starting index [optional, default 0] |
| 49 | +2. index next to final |
| 50 | +3. step [optional, default 1] |
| 51 | + |
| 52 | +```python |
| 53 | +print(np.arange(1000000)) |
| 54 | +print(np.arange(0.1, 0.2, 0.01)) |
| 55 | +print(np.arange(0.9, 0.0, -0.1)) |
| 56 | +``` |
| 57 | + |
| 58 | +``` |
| 59 | +[ 0 1 2 ... 999997 999998 999999] |
| 60 | +[0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19] |
| 61 | +[0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1] |
| 62 | +``` |
| 63 | + |
| 64 | +As already mentioned, in case of an `array`, we can perform mathematical operations on all elements of the array using one operator or function. This behavior is different than that of lists and other Python sequences. For example, if we wanted to multiply all the elements of a list `L` by a number `a`, we would need to loop: |
| 65 | + |
| 66 | +```python |
| 67 | +L = [1, 3, 5, 2, 3, 1] |
| 68 | +for i, x in enumerate(L): |
| 69 | + L[i] = a * x |
| 70 | +``` |
| 71 | + |
| 72 | +You can also write it more succinctly using a generator expression: |
| 73 | + |
| 74 | +```python |
| 75 | +L = [1, 3, 5, 2, 3, 1] |
| 76 | +L = [a * x for x in L] # unlike the loop version, here L will be replaced by a new list |
| 77 | +L[::] = [a * x for x in L] # a this will keep the identity of the L list, just like the loop version |
| 78 | +``` |
| 79 | + |
| 80 | +however, this is still just a simplified notation of a loop. On the other hand, multiplying all the elements of an array `M` by a number `a` looks like this: |
| 81 | + |
| 82 | +```python |
| 83 | +M = np.array([1, 3, 5, 2, 3, 1]) |
| 84 | +M = a * M |
| 85 | +``` |
| 86 | + |
| 87 | +or even simpler (keeping the identity of `M`): |
| 88 | + |
| 89 | +```python |
| 90 | +M *= a |
| 91 | +``` |
| 92 | + |
| 93 | +Operations performed at once on entire matrices have many advantages. The program code is simpler and shorter, making it less error-prone. In addition, we do not have to worry about the specific implementation of a given operation: it is done for us by NumPy, which is specially optimized to make it work as quickly as possible. |
| 94 | + |
| 95 | +Check some other methods of creating arrays: [linspace](http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html), [zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), [ones](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html), [mgrid](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mgrid.html), [ogrid](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ogrid.html), [r_](http://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html). |
| 96 | + |
| 97 | + |
| 98 | +## Array shape |
| 99 | + |
| 100 | +As you may have noticed, NumPy arrays can have a different number of dimensions: |
| 101 | + |
| 102 | +* a one-dimensional array `A` is an equivalent of a vector, its elements `A[k]` are numbered with the value of a single index (pointer), ranging from `0` to `len(A) - 1` - similar to a list in “normal” Python |
| 103 | +* a two-dimensional array, say `M`, is the equivalent of a matrix with elements `M[k,l]`; if `k = 0, ...K`, `l = 0, ...L`, then it has `K * L` elements |
| 104 | +* in general, to describe a _shape_ of a NumPy array, a tuple of positive integers is given describing the range of values of its individual indices (and the number of elements of the tuple is, of course, the number of array dimensions). This tuple can be read using the attribute `M.shape`: |
| 105 | + |
| 106 | +```python |
| 107 | +M = np.array([[0.61064052, 0.51970673, 0.06353282], |
| 108 | + [0.50159111, 0.83545043, 0.10928144]]) |
| 109 | +print(M.shape) |
| 110 | +``` |
| 111 | + |
| 112 | +``` |
| 113 | +(2, 3) |
| 114 | +``` |
| 115 | + |
| 116 | +The shape of a one-dimensional array is obviously a one-element tuple (and not a single number) i.e. `(n,)`. |
| 117 | + |
| 118 | +**Note:** The function `len(A)` applied to the NumPy array `A` will only return the number of possible values for the first index, not the number of elements in the array. The number of array elements is given by the attribute `A.size`. |
| 119 | + |
| 120 | +A series of functions creating new arrays takes the shape of the array that is to be created (i.e. a tuple of natural numbers) as an argument (or one of the arguments), e.g. `numpy.zeros(shape)` and `numpy.ones(shape)` create an array of zeros or ones of any given shape, respectively. There are also operations to get a reshaped array, filled with data from an existing array: |
| 121 | + |
| 122 | +```python |
| 123 | +M.reshape((3, 2)) |
| 124 | +# array([[0.61064052, 0.51970673], |
| 125 | +# [0.06353282, 0.50159111], |
| 126 | +# [0.83545043, 0.10928144]])) |
| 127 | + |
| 128 | +M.reshape((6,)) |
| 129 | +# array([0.61064052, 0.51970673, 0.06353282, 0.50159111, 0.83545043, 0.10928144]) |
| 130 | +``` |
| 131 | + |
| 132 | +instead of the latter, you can use the “flatten” operation `M.flatten()`. |
| 133 | + |
| 134 | +**Note:** The dimensions of the array before and after the conversion (i.e. the number of elements) must match. |
| 135 | + |
| 136 | +You can also assign a new value to the attribute `shape`: |
| 137 | + |
| 138 | +```python |
| 139 | +M.shape = (2, 3) |
| 140 | +# array([[0.61064052, 0.51970673, 0.06353282], |
| 141 | +# [0.50159111, 0.83545043, 0.10928144]]) |
| 142 | +``` |
| 143 | + |
| 144 | +but then we change the shape of the existing array. Naturally, also in this case the sizes (original and after reshaping) must match. |
| 145 | + |
| 146 | + |
| 147 | +## Arrays as views of the data |
| 148 | + |
| 149 | +In Python, operations with mutable types (for example, lists that can change their content while retaining their identity) can either change the content of an object, or create a new object with content based on the original content. If _the same object_ appears under different names, then all such instances are identical. In NumPy it is a bit different: arrays transformed in various ways (e.g. by an operation `reshape()`) often turn out to be different views (for Python they are **different objects**) of **the same data**. For example: |
| 150 | + |
| 151 | + |
| 152 | +```python |
| 153 | +A = np.arange(24) |
| 154 | +print(A) |
| 155 | +# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] |
| 156 | + |
| 157 | +B = A.reshape(6, 4) |
| 158 | +print(B) |
| 159 | +# [[ 0 1 2 3] |
| 160 | +# [ 4 5 6 7] |
| 161 | +# [ 8 9 10 11] |
| 162 | +# [12 13 14 15] |
| 163 | +# [16 17 18 19] |
| 164 | +# [20 21 22 23]] |
| 165 | + |
| 166 | +A[-1] = 0 |
| 167 | +print(B) |
| 168 | +# [[ 0 1 2 3] |
| 169 | +# [ 4 5 6 7] |
| 170 | +# [ 8 9 10 11] |
| 171 | +# [12 13 14 15] |
| 172 | +# [16 17 18 19] |
| 173 | +# [20 21 22 0]] |
| 174 | + |
| 175 | +print(A is B) # False |
| 176 | +print(A == B) # False |
| 177 | +``` |
| 178 | + |
| 179 | +We changed an element of array `A`, but the corresponding element or array `B` changed as well, although we did not perform any operation on it, and it is of a different shape than `A`. For Python these two arrays are different, although they share the same data. |
| 180 | + |
| 181 | +This behavior results (among others) from the desire to optimize: NumPy is designed to operate on rather large arrays of data, so attempts are made to avoid unnecessary copying of data between arrays (which wastes memory and other system resources). However, it should be remembered that if we need a table that is really independent of the original, containing the same (or derivative) data, then it is better to copy this data explicitly (e.g. with a function `numpy.copy`). You should carefully read the documentation of the functions and methods you use, especially as the rules governing whether we are dealing with a copy of the data or a new view are not very consistent. |
| 182 | + |
| 183 | +## Data extraction |
| 184 | + |
| 185 | +<img style="float: right;" src="macierz.png"> |
| 186 | + |
| 187 | +### Single numbers |
| 188 | + |
| 189 | +Elements (and sub-arrays) of one-dimensional arrays can be accessed exactly the the same way as in case of lists i.e. by using index (`data[i]`) and slice (`data[i:j]`). |
| 190 | + |
| 191 | +For multi-dimensional arrays, you should give a number of indices or slices (separated by comma) equal to the number of array dimensions: |
| 192 | + |
| 193 | +Access to a single item: |
| 194 | + |
| 195 | +```python |
| 196 | +A = np.array([[1, 2, 3], [4, 5, 6]]) |
| 197 | +print(A) |
| 198 | +# [[1 2 3] |
| 199 | +# [4 5 6]] |
| 200 | + |
| 201 | +print(A[0, 2]) # 3 |
| 202 | +``` |
| 203 | + |
| 204 | +The matrix A is a two-dimensional array, and the objects it contains are numbered as follows: the first index runs along the first dimension (selects a row), the second index runs along the second dimension (selects a column). |
| 205 | + |
| 206 | +### Sub-arrays |
| 207 | + |
| 208 | +Access to sub-arrays: |
| 209 | + |
| 210 | +```python |
| 211 | +print(A[1]) # 1 line |
| 212 | +# [4 5 6] |
| 213 | + |
| 214 | +print(A[1, :]) # line 1, all columns of |
| 215 | +# [4 5 6] |
| 216 | + |
| 217 | +print(A[:, 1]) # all lines, column 1 |
| 218 | +# [2 5] |
| 219 | +``` |
| 220 | + |
| 221 | +As you can see, limiting yourself to a single point in a given dimension causes that dimension to degenerate. The result is an array in which the number of dimensions is less by one. |
| 222 | + |
| 223 | +```python |
| 224 | +print(A[:, 1:]) |
| 225 | +# [[2 3] |
| 226 | +# [5 6]] |
| 227 | +``` |
| 228 | + |
| 229 | +In the first dimension (lines) we take everything, while in the second dimension, we take everything from 1 to the end. We effectively cut column 0. |
| 230 | + |
| 231 | +### Indexing arrays with arrays |
| 232 | + |
| 233 | +You can also use another array to select items from an array. It could be |
| 234 | + |
| 235 | +* array of numbers: then they are treated as indices. We choose those elements that would be obtained by indexing each index separately |
| 236 | +* array of `bool` values, which has the same size as the indexed data: We select those elements to which it the corresponding value of the indexing `bool` array is `True`. |
| 237 | + |
| 238 | +**Note:** The result is a one-dimensional array. |
| 239 | + |
| 240 | +Example: |
| 241 | + |
| 242 | +```python |
| 243 | +print(A) |
| 244 | +# [[1 2 3] |
| 245 | +# [4 5 6]] |
| 246 | + |
| 247 | +print(A > 2) |
| 248 | +# [[False False True] |
| 249 | +# [ True True True]] |
| 250 | + |
| 251 | +print(A[A > 2]) |
| 252 | +# [3 4 5 6] |
| 253 | + |
| 254 | +print(A[A % 2 == 0]) |
| 255 | +[2 4 6] |
| 256 | +``` |
| 257 | + |
| 258 | +More: <http://docs.scipy.org/doc/numpy/user/basics.indexing.html> |
| 259 | + |
| 260 | + |
| 261 | +## Data operations in NumPy tables |
| 262 | + |
| 263 | +### Arithmetic |
| 264 | + |
| 265 | +To allow convenient handling of data contained in NumPy arrays, the basic arithmetic operations of NumPy are extended to cover the contents of the array without (usually) writing any loops. For example, you can multiply an array by a number, add a number to it, and so on, and this will affect all the elements of the array: |
| 266 | + |
| 267 | +```python |
| 268 | +M = np.arange(24).reshape((4, 6)) * 2 + 1 |
| 269 | +print(M) |
| 270 | +# [[ 1 3 5 7 9 11] |
| 271 | +# [13 15 17 19 21 23] |
| 272 | +# [25 27 29 31 33 35] |
| 273 | +# [37 39 41 43 45 47]] |
| 274 | +``` |
| 275 | + |
| 276 | +Moreover, you can also perform arithmetic operations on two (and more) arrays: |
| 277 | + |
| 278 | +```python |
| 279 | +N = 1 / M |
| 280 | +print(N) |
| 281 | +# [[1. 0.33333333 0.2 0.14285714 0.11111111 0.09090909] |
| 282 | +# [0.07692308 0.06666667 0.05882353 0.05263158 0.04761905 0.04347826] |
| 283 | +# [0.04 0.03703704 0.03448276 0.03225806 0.03030303 0.02857143] |
| 284 | +# [0.02702703 0.02564103 0.02439024 0.02325581 0.02222222 0.0212766 ]] |
| 285 | + |
| 286 | +print(N * M) |
| 287 | +# [[1. 1. 1. 1. 1. 1.] |
| 288 | +# [1. 1. 1. 1. 1. 1.] |
| 289 | +# [1. 1. 1. 1. 1. 1.] |
| 290 | +# [1. 1. 1. 1. 1. 1.]] |
| 291 | +``` |
| 292 | + |
| 293 | +So, for example, arrays of compatible shapes can be added together, multiplied, etc., and these operations will be performed in pairs on all elements. |
| 294 | + |
| 295 | +**Note:** this is not quite an implementation of the defined mathematics of arithmetic operations on vectors, matrices, etc. In particular, in mathematics, [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication) does not mean multiplying elements in pairs! Mathematically correct multiplication can be performed in NumPy using `@` operator: |
| 296 | + |
| 297 | +```python |
| 298 | +A = np.array([[1, 2], [3, 4]]) |
| 299 | +B = np.array([[-4, 2], [3, -1]]) |
| 300 | + |
| 301 | +print(A * B) |
| 302 | +# [[-4 4] |
| 303 | +# [ 9 -4]] |
| 304 | + |
| 305 | +print(A @ B) |
| 306 | +# [[2 0] |
| 307 | +# [0 2]] |
| 308 | +``` |
| 309 | + |
| 310 | +### Mathematical functions |
| 311 | + |
| 312 | +Moreover, the numpy module contains implementations of basic functions appearing in physical and mathematical formulas, such as `sin`, `cos`, `exp`, `log` and many others, in a version adapted to operating on table data, also element by element. Even more of these functions are provided by other sub-modules in the NumPy package, and the SciPy (Scientific Python) package. For example: |
| 313 | + |
| 314 | +```python |
| 315 | +X = np.arange(0, 2 * np.pi, 0.1) |
| 316 | + |
| 317 | +print(np.sin(X)**2 + np.cos(X)**2) |
| 318 | +# [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. |
| 319 | +# 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. |
| 320 | +# 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] |
| 321 | +``` |
| 322 | + |
| 323 | +The above calculation checks the validity of the formula for "trigonometric one" within one period, with a resolution of 0.1 radians. |
| 324 | + |
| 325 | + |
| 326 | +## Why Use NumPy? |
| 327 | + |
| 328 | +The first, usually the least important, is *performance*. If we are to multiply 100 elements, the speed of operation on a single element does not matter. It is similar with the size of a single element. If the elements are 10<sup>6</sup> , then also the overhead does not matter. Let's count: 1 000 000 times 12 bytes is 12 MB. A typical computer currently has 1-4 GB of memory, so we use 1.2% to 0.27% of the available memory — what's the problem? It is only when the space occupied by the data is in the same order as all available memory that whether a single cell is 8 or 16 bytes begins to matter. |
| 329 | + |
| 330 | +The second reason, which is important for your enjoyment of work, is object-oriented and infix *notation*. The former is of course “dot notation”: access to methods and attributes on the object. Its use, especially in combination with TAB padding, simplifies writing. An example of object notation: |
| 331 | + |
| 332 | +```python |
| 333 | +a.transpose().min() |
| 334 | +# instead of |
| 335 | +numpy.min(numpy.transpose(a)) |
| 336 | +``` |
| 337 | + |
| 338 | +The latter (infix) is the good old “math notation”: the placement of binary operators between the objects they act on. An example of infix notation: |
| 339 | + |
| 340 | +```python |
| 341 | +a + b * c |
| 342 | +# instead of |
| 343 | +numpy.add(a, numpy.multiply(b, c)) |
| 344 | +``` |
| 345 | + |
| 346 | +Of course, object and infix notation is used everywhere in Python, but it's worth mentioning that NumPy does not deviate from it. Nevertheless, NumPy departs from the Python interpretation of some actions. In Python, operations such as multiplying lists are derived from string operations. In numerical computation, operations on elements are the basis, so in NumPy all operators default to individual pairs of elements. |
| 347 | + |
| 348 | +The third reason, and perhaps the most important, is the *library of numeric functions*. Moving away from data objectivity allows you to export values and communicate with libraries written in completely different programming languages. For example, SciPy can use the LAPACK library (Linear Algebra PACKage, written in Fortran 77). The fact that functions written in different languages can exchange data in memory without complicated data translation is due to the fact that, as described in the previous section, all numbers are ultimately in a processor-accepted format. |
| 349 | + |
| 350 | +The ability to use code written in C or Fortran allows the use of old, optimized, proven solutions. |
| 351 | + |
| 352 | + |
| 353 | +<hr /> |
| 354 | +<p id="copyright">Published under <a class="external" rel="nofollow" href="https://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-NonCommercial-ShareAlike</a> license.<br/> |
| 355 | +Original author Robert J. Budzyński. Source: <https://brain.fuw.edu.pl/edu/index.php/PPy3/NumPy>.</p> |
0 commit comments