EPITA 2021 MLRF practice_01-02_numpy v2021-05-17_160644 by Joseph CHAZALON
Make sure you read and understand everything, and complete all the required actions.
Required actions are preceded by the following sign:
Perform a couple checks…
# deactivate buggy jupyter completion
%config Completer.use_jedi = False
# Make sure we use Python 3
import sys
if sys.version_info.major != 3:
print("ERROR: not using Python 3.x")
else:
print("Great! We're using Python version %s" % sys.version)
Great! We're using Python version 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0]
Notice the line magic used to configure how matplotlib output is rendered.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
NumPy allows you to manipulate n-dimensional arrays (representing matrices, tensors, images…) with a very simple syntax.
Here are some examples of array creation:
# Initialize from a sequence
a1 = np.array([1, 2, 3])
a1
array([1, 2, 3])
# Array have arbitrary dimensions…
a2 = np.array([[[ 0, 1], [ 2, 3], [ 4, 5]],
[[ 6, 7], [ 8, 9], [10, 11]]])
a2
array([[[ 0, 1], [ 2, 3], [ 4, 5]], [[ 6, 7], [ 8, 9], [10, 11]]])
#…but they need to be consistent
a3 = np.array([[[ 0, 1], [ 2, 3], [ 4, 5]],
[[ 6, 7], [ 8, 9], [10, 11, 13]]])
a3
array([[list([0, 1]), list([2, 3]), list([4, 5])], [list([6, 7]), list([8, 9]), list([10, 11, 13])]], dtype=object)
shape
and dtype
¶Shape and content (data) type are two very important properties to check for arrays.
a1.shape, a1.dtype
((3,), dtype('int64'))
a2.shape, a2.dtype
((2, 3, 2), dtype('int64'))
a3.shape, a3.dtype
((2, 3), dtype('O'))
Do not hesitate to check:
# TODO create a couple of arrays
We recommand that you have a look at:
zeros
zeros_like
ones
full
empty
eye
test_shape = (2, 2)
np.zeros(test_shape)
array([[0., 0.], [0., 0.]])
# TODO try the other array creation routines
A very important thing to note with NumPy is that native routines make use of optimized C code which is orders of magnitude faster than Python loops.
You should always try to avoid writing Python loops to access NumPy arrays, and you should rather try to find a native routine which does the task you are looking for.
# TODO manual initialization: complete this code
size = 1024*1024
a = np.empty(size)
# for ii in range…
#
%%timeit
# prof
size = 1024*1024
a = np.empty(size)
for ii in range(len(a)):
a[ii] = 1
66.6 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# TODO numpy creation and optimized initialization
a = np.empty(size)
# a[?] = ?
%%timeit
# prof
a = np.empty(size)
a[:] = 1
840 µs ± 64.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
# prof
a = np.empty(size)
a.fill(1)
885 µs ± 80.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# TODO numpy optimized creation and initialization
# a = ??
%%timeit
# prof
a = np.full(size, 1)
856 µs ± 46.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
There are other very useful array creation routines to be aware of.
Among my favorites are arange
and linspace
.
# TODO
np.arange?
It is easy to change the shape of an array, as long as the new shape is compatible with the original one.
a = np.arange(12)
a.shape
(12,)
# TODO reshape
All the power of NumPy lies in how we apply operations on arrays. We can apply operations in 3 different ways:
a = np.arange(3)
a.max()
Second by calling a NumPy operation on the array like this:
a = np.linspace(0, 1, 10)
np.cos(a)
This second technique is more suitable for mathematical operations which are not directly available as methods, and return an array of the same shape.
Third simply by calling natural operations extended to arrays like this:
a = np.arange(0, 3)
b = np.arange(3, 6)
a + b
# TODO some operations on arrays
You can also access individual values of arrays using advanced slicing techniques:
aa = np.arange(3*2).reshape((3, 2))
aa
array([[0, 1], [2, 3], [4, 5]])
aa[0]
array([0, 1])
aa[0][1]
1
We can specify slices for each dimension.
aa[0,1]
1
aa[1:3]
array([[2, 3], [4, 5]])
aa[:,1]
array([1, 3, 5])
aa[::2,::-1]
array([[1, 0], [5, 4]])
aa
array([[0, 1], [2, 3], [4, 5]])
aa[(0,0)] # equivalent to aa[0,0]
0
aa[(1,1,2), 0] # equivalent to aa[(1, 1, 2), (0, 0, 0)] because of broadcast
# selects aa[1,0], aa[1,0], aa[2,0]
array([2, 2, 4])
We can even add new axis on the fly:
bb = aa[:, 0, np.newaxis]
bb.shape
(3, 1)
Note that np.newaxis
is actually None
, so you it is common to use None
directly.
np.newaxis
bb = aa[:, 0, None]
bb.shape
(3, 1)
And you can create masks and apply them. This is very powerful!
aa = np.arange(10)
mask = aa > 5
mask
array([False, False, False, False, False, False, True, True, True, True])
aa[mask]
array([6, 7, 8, 9])
a = np.array([[1, 0, 2], [3, 7, 9], [1, 0, 2], [3, 7, 9], [3, 7, 9]])
a
array([[1, 0, 2], [3, 7, 9], [1, 0, 2], [3, 7, 9], [3, 7, 9]])
# TODO correct this line
a_extracted = a[:]
# prof
# v1: indexes
a_extracted = a[(0,2),1:]
a_extracted
# v2: mask (better, but flattened)
a_extracted = a[a%2 == 0]
a_extracted
array([0, 2, 0, 2])
# Here is a test to check your result
if np.all(a_extracted % 2 == 0):
print("Looks good!")
else:
print("Error.")
Looks good!
Make sure to read at least once in your life (no during this session though) the page about NumPy indexing.
Broadcasting is a very powerful concept in NumPy, and maybe its greatest strength. However, it takes times to master it and even then you sometimes get surprised.
According to the official documentation:
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.
It is easy to make use of broadcasting:
Let's have a look at some examples now.
First NumPy operations are usually done element-by-element which requires two arrays to have exactly the same shape:
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
a * b
array([2, 4, 6])
NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:
a = np.array([1,2,3])
b = 2
a * b
array([2, 4, 6])
The broadcasting applied in the previous example virtually "streches" b
to match a
's shape.
This can be illustrated by the following figure:
The rule governing whether two arrays have compatible shapes for broadcasting can be expressed in a single sentence.
The Broadcasting Rule:
In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.
Here are more examples (taken from the documentation, again):
a = np.array([[ 0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
b = np.array([0, 1, 2])
a + b
array([[ 0, 1, 2], [10, 11, 12], [20, 21, 22], [30, 31, 32]])
A two dimensional array multiplied by a one dimensional array results in broadcasting if number of 1-d array elements matches the number of 2-d array columns.
However, when the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the 1st array with the elements of the 2nd arrays for element-by-element addition.
a = np.array([[ 0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
b = np.array([0, 1, 2, 3])
a + b
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-47-bb889cf1b342> in <module> 4 [30, 30, 30]]) 5 b = np.array([0, 1, 2, 3]) ----> 6 a + b ValueError: operands could not be broadcast together with shapes (4,3) (4,)
The following example shows an outer addition operation of two 1-d arrays that produces the same result as the previous (working) example.
Here the newaxis
index operator inserts a new axis into a
, making it a two-dimensional 4x1 array.
a = np.array([0.0, 10.0, 20.0, 30.0])
b = np.array([1.0, 2.0, 3.0])
a[:, np.newaxis] + b
array([[ 1., 2., 3.], [11., 12., 13.], [21., 22., 23.], [31., 32., 33.]])
The following figure illustrates the stretching of both arrays to produce the desired 4x3 output array.
# TODO display the shape of a when we add it a new axis like in the previous example
a.shape
(4,)
Most of the aggregation function allow you to specify the axis along which the computation will be performed.
axis=0
means the first axis, axis=i
means the $i+1$ axis, axis=-1
means the last axis.
This allows, for example, to compute the warmest month for each city (or the warmest city for each month).
# Some probably buggy stats
data = np.array([
# January,February,March,April,May,June,July,August,September,October,November,December
[14,14,16,18,22,25,28,29,26,23,18,15], # Ajaccio
[14,14,16,18,22,26,29,29,26,22,17,15], # Bastia
[5,7,12,15,20,24,26,26,22,17,10,5], # Bourg-Saint-Maurice
[10,11,14,17,21,25,29,28,25,19,14,10], # Carcassonne
[6,8,12,15,20,24,27,26,22,17,10,6], # Grenoble
[6,8,13,16,21,25,28,27,23,17,11,7], # Lyon
[11,13,16,19,23,27,30,30,26,21,15,12], # Marseille
[8,10,15,18,22,26,30,29,24,19,12,9], # Montelimar
[12,13,16,18,22,26,29,29,25,21,15,12], # Montpellier
[13,13,15,17,21,24,27,28,25,21,17,14], # Nice
[12,13,16,18,22,26,29,29,25,21,16,13], # Perpignan
[13,14,16,18,22,26,30,30,26,21,16,14], # Toulon
])
months = np.array(["January","February","March","April","May","June","July",
"August","September","October","November","December"])
cities = np.array(["Ajaccio", "Bastia", "Bourg-Saint-Maurice", "Carcassonne",
"Grenoble", "Lyon", "Marseille", "Montelimar", "Montpellier",
"Nice", "Perpignan", "Toulon"])
# TODO use the `argmax` operation on `data`
warmest_months = np.zeros(12, dtype=int) # FIXME replace this line
warmest_months
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
list(zip(cities, months[warmest_months]))
[('Ajaccio', 'January'), ('Bastia', 'January'), ('Bourg-Saint-Maurice', 'January'), ('Carcassonne', 'January'), ('Grenoble', 'January'), ('Lyon', 'January'), ('Marseille', 'January'), ('Montelimar', 'January'), ('Montpellier', 'January'), ('Nice', 'January'), ('Perpignan', 'January'), ('Toulon', 'January')]
# prof
list(zip(cities, months[data.argmax(axis=1)]))
[('Ajaccio', 'August'), ('Bastia', 'July'), ('Bourg-Saint-Maurice', 'July'), ('Carcassonne', 'July'), ('Grenoble', 'July'), ('Lyon', 'July'), ('Marseille', 'July'), ('Montelimar', 'July'), ('Montpellier', 'July'), ('Nice', 'August'), ('Perpignan', 'July'), ('Toulon', 'July')]
# TODO
# prof
list(zip(months, cities[data.argmax(axis=0)]))
[('January', 'Ajaccio'), ('February', 'Ajaccio'), ('March', 'Ajaccio'), ('April', 'Marseille'), ('May', 'Marseille'), ('June', 'Marseille'), ('July', 'Marseille'), ('August', 'Marseille'), ('September', 'Ajaccio'), ('October', 'Ajaccio'), ('November', 'Ajaccio'), ('December', 'Ajaccio')]
You can "glue" arrays together as long as their shape is compatible.
a = np.arange(3*2).reshape((3,2))
b = np.arange(3*2).reshape((3,2))
np.hstack((a,b))
array([[0, 1, 0, 1], [2, 3, 2, 3], [4, 5, 4, 5]])
np.vstack((a,b))
array([[0, 1], [2, 3], [4, 5], [0, 1], [2, 3], [4, 5]])
np.stack((a,b), axis=-1)
array([[[0, 0], [1, 1]], [[2, 2], [3, 3]], [[4, 4], [5, 5]]])
Array indexing may not copy the memory but returns a view instead. In this case, changing the view changes the original array. Make sure to make a copy of the original array, or of the view's underlying data, if you do not want to use the same object twice!
The simplest case is when a reference is copied (either during assignment or during a function call).
a = np.array([10, 20, 30])
b = a
b += 1
a
array([11, 21, 31])
You can use the copy()
method to perform a deep copy of some array.
# TODO copy a into b instead of creating an extra reference to the same object
a = np.array([10, 20, 30])
b = a
b += 1
a
array([11, 21, 31])
# prof
a = np.array([10, 20, 30])
b = a.copy()
b += 1
a
array([10, 20, 30])
Slicing an array returns a view of it!
a = np.array([10, 20, 30])
s = a[1:]
s += 1
a
array([10, 21, 31])
Just for the record, NumPy also contains many linera algebra and other useful routines for statistics, mathematics, random sampling, etc.
You'll discover them progressively.
You can plot data using the simple stateful plt
interface.
You start by creating a figure with
plt.figure()
then you plot some data, plots are added to the current figure:
plt.plot([0, 1, 2, 3], [1, 3, 5, 7])
plt.plot([0, 1, 2, 3], [2, 4, 6, 8])
and finally you call the rendering function:
plt.show()
Here is a more complete example you will be able to reuse:
plt.figure()
plt.plot([0, 1, 2, 3], [1, 3, 5, 7], label='first')
plt.plot([0, 1, 2, 3], [2, 4, 6, 8], label='second')
plt.legend()
plt.title("First figure")
plt.ylabel('some numbers')
plt.xlim(0, 5)
plt.show()
And another one showing two images in two different subfigures.
img1 = plt.imread('img/warning.png')
img2 = plt.imread('img/stop.png')
plt.figure()
plt.subplot(1, 2, 1) # values: total number of rows, total number of columes, index (starting at 1)
plt.imshow(img1)
plt.axis('off')
plt.title("subfig1 title")
plt.subplot(1, 2, 2)
plt.imshow(img2)
plt.axis('on')
plt.show()
Another example with an histogram.
sample_img = plt.imread("img/practice_01/sample_img.png") # matplotlib's imread only supports PNG files
# This is just a numpy array!
plt.figure()
plt.subplot(1, 2, 1)
plt.imshow(sample_img)
plt.axis('off')
plt.title("Image")
plt.subplot(1,2,2)
# numpy ravel() returns a flatten array.
plt.hist(sample_img[..., 0].ravel(), bins=256, fc='r', ec='r', alpha=0.5)
plt.hist(sample_img[..., 1].ravel(), bins=256, fc='g', ec='g', alpha=0.5)
plt.hist(sample_img[..., 2].ravel(), bins=256, fc='b', ec='b', alpha=0.5)
plt.title("Basic color histogram")
plt.show()
There are many possible graph types, and many options to configure colors, legends, markers, to add annotations, etc. You will discover them by practicing and by looking at examples.
Let's just finish this very quick introduction to Matplotlib by pointing out useful resources:
Great! Now you're ready to move on to the next stage: Image manipulations.