Libraries and toolboxes for python#
At the start of the course we had an introduction to the Python programming language and learned some basic concepts such as:
assigning and using variables to store values
basic types (numbers, strings, lists, dictionaries) for representing data
loops (
for
) for repeating computational operationsconditionals (
if
,else
) for branching program executionfunctions (
def
) for organising code into functional blocks
While these components will form the fundamentals of any program that you will write, they are present in essentially all modern programming languages.
One of the big advantages of Python is that there are a very large number of so-called packages, which add extra functionality to the language. Python already has a very capable and complete standard library, however there are also many 3rd party packages that can handle diverse topics:
producing publication-quality plots (
matplotlib
,seaborn
,bokeh
, …)creating interactive data vizualisations (
holoviews
, …)analyzing and manipulating images (
scikit-image
,PIL
,opencv
, …)executing machine learning algorithms (
scikit-learn
,tensorflow
,theano
,…)data analysis (
pandas
)biology (
biopython
)interacting with websites (
requests
), google services (google-api-python-client
) etc.
and much, much more! Remember, google is your friend, and – for scientific applications – you can try to find a scikit
package that meets your needs
Note on nomenclature
Throughout the course (and on the web) you will come accross the terms package
and module
to refer to these extensions to the language.
In Python the terms package
and module
technically refer to different things, however they can also be used
as general terms (synonymous with “library”) to mean “extensions that provide additional functionality”.
Learning Goals#
There are two main aims for this part of the course:
using the three main libraries for scientific computing in Python (
numpy
,scipy
, andmatplotlib
) to perform simple tasks.
These three packages are the bread and butter of scientific Python. Many other packages (especially for plotting) are based on these, and a solid understanding of how to use them will pay dividends later.searching for, installing, and using libraries adapted to a particular problem domain
Often you will have a problem to solve and no idea initially as to how to go about solving it. Luckily, a lot of the time the work has already been done for you. This may take the form of a ready-to-use package that you can install and use, or it may be something more rudimentary like a code snippet.
numpy
: efficient numeric arrays#
Why do we need arrays anyway?#
We already saw that Python has a built-in list datatype that can be used to hold collections of values. Often we will want to deal with arrays of numbers (integers, reals, or complex). While in principle we could make “vectors” of numbers just as lists:
# using a Python list as a vector -- bad idea
vector = [1, 2, 3]
and “matrices” as lists of lists of numbers:
# using a list of lists as a matrix -- bad idea
matrix = [[1, 2],
[3, 4]]
in practice this is not very well suited to performing common operations one would expect to do on matrices and vectors (such as adding them together, or multiplying them).
One reason is that, in Python, adding two lists together does not do vector addition, but actually appends the second list to the first, like so:
vector_a = [1, 2, 3]
vector_b = [4, 5, 6]
print(vector_a + vector_b)
Similarly multiplication by an integer n actually appends n copies of the original list together:
print(vector_a * 3)
Another reason to avoid using lists is efficiency. If we manually tried to do vector addition with the two lists:
N = int(1E6)
vector_a = list(range(N)) # 0 ... N-1
vector_b = list(range(N, 2 * N)) # N ... 2N - 1
vector_c = [None] * N # create a list with N entries, each one equal to `None`
# naive vector addition using lists and explicit loops: slow and difficult to read!
for i in range(len(vector_a)):
vector_c[i] = vector_a[i] + vector_b[i]
this will (as we shall see) be many times less efficient than using native arrays with numpy.
Importing numpy#
We already have a number of packages installed on our system, but to use a particular package in our Python program we must first import it:
import numpy
we may now use the name numpy
in our program to access the functionality
provided by the Numpy package.
In order to save some typing we may import the package and assign
it to a shorter name instead. We will use “np
” (and you should too):
import numpy as np
Creating arrays#
Now we can use Numpy to create arrays of arbitrary dimensions:
# 1D array (vector)
a = np.array([1, 2, 3, 4])
print("vector of size:", a.shape)
# 2D array (matrix)
b = np.array([[1, 3],
[2, 4]])
print("matrix of size:", b.shape)
In the above we created a numpy array from a list
. Numpy also has functions for creating arrays directly, without having to first create a list:
# 1D array of zeros, each one is a a real number (float)
a = np.zeros((3,), dtype=float)
print("array of shape:", a.shape, "and type:", a.dtype)
# 2D array of ones, each one is an integer (int)
b = np.ones((3, 2), dtype=int)
print("array of shape:", b.shape, "and type:", b.dtype)
# 2D array being a unit matrix
c = np.identity(3, dtype=float)
print("array of shape:", c.shape, "and type:", c.dtype)
# 3D array of ones, each one is a complex number (complex)
d = np.ones((3, 2, 3), dtype=complex)
print("array of shape:", d.shape, "and type:", d.dtype)
The above example illustrates two key ways in which Numpy arrays differ from lists:
Numpy arrays have a fixed size
Numpy arrays have a fixed type
In this respect they are akin to arrays in other languages such as C.
Numpy even has functions akin to the Python built-in range
, except that they return Numpy arrays:
# step size 0.1, does not include upper bound (1)
a = np.arange(0, 1, 0.1)
print('arange:', a)
# 10 linearly spaced samples, includes lower and upper bounds
b = np.linspace(0, 1, 10)
print('linspace:', b)
Extracting array elements#
We can access elements in arrays in the same manner as with list
s:
a = np.arange(10)
print('first element:', a[0], ',last element:', a[-1])
# slicing into the array
print('first 3 elements:', a[0:3]) # slice implicitly starts from 0, so we could ommit the 0
print('last 3 elements:', a[-3:]) # slice implicitly stops at the end
print('even elements:', a[::2]) # step == 2, take every other element
print('odd elements:', a[1::2]) # step == 2, start == 1, take every other element, starting from the second one
Warning about indexing
Remember that in Python the first element in a container has index 0
. This is the same as
in C, but differs from Matlab and Fortran, where indexing starts from 1
instead.
Similarly, the last element has index N-1
in Python (although, recall that we may use negative indices to index from the end, so the last element also has index -1
).
we can access elements in arrays with rank > 1 in an equally intuitive way:
b = np.array([[1, 2],
[3, 4]])
print('first element:', b[0, 0], ',second element:', b[0, 1])
print('first column:', b[:, 0]) # ":" on its own means "get all the elements"
print('first row:', b[0, :])
see also: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#basic-slicing-and-indexing
We can also reshape arrays, as long as the underlying number of elements does not change:
b = np.array([[1, 2],
[3, 4]])
a = b.reshape((4,))
be aware, however, that a
and b
now refer to the same memory:
print(b)
a[0] = 1000 # modify entry 0 of a
print()
print(b) # a points to the same memory as b
Mini-exercises#
Make an array with integer numbers from 0 up to and including 9. Print the first 4 entries.
Make a matrix of the form
Which data type does this matrix have?
Now assign a) the first column and b) the second row each to a separate vector.
Solutions#
Show code cell content
x = np.arange(10)
print(x[:4])
Show code cell content
A = np.array([[1.5, 2],
[4, 0]])
print(A.dtype)
v = A[:,0]
print(v)
w = A[1,:]
print(w)
Basic array algebra#
An advantage of using Numpy arrays is that they natively support common algebraic operations:
a = np.arange(3)
b = np.arange(3, 6)
print('a + 2:', a + 2) # element-wise addition
print('a + b:', a + b)
print('a * 3:', a * 3) # element-wise multiplication
print('a * b:', a * b) # element-wise multiplication
print('a · b:', a @ b) # dot product
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print('M a:', M @ a) # matrix-vector multiply
print('M M:', M @ M) # matrix-matrix multiply
Note on @
The operator @
is new since Python 3.5. Alternatively, the vector-vector, matrix-vector and matrix-matrix multiplication of the example above can be written as
a.dot(b)
, M.dot(a)
or M.dot(M)
Element-wise functions#
In addition, numpy has a number of functions that will act element-wise on the elements:
a = np.arange(3)
b = np.array([[1, 2, 3],
[4, 5, 6]])
print('sin(a):', np.sin(a))
print()
print('exp(b):', np.exp(b))
see also: https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs
Warning about Numpy functions
Numpy functions and algebraic operations act element-wise, that is, doing something
like M + 1
(where M
is a matrix) will add 1
to each of the elements, rather
than adding the identity matrix. Similarly, np.exp(M)
will exponentiate all the components
of M
; it does not compute the matrix exponential.
Numpy functions acting on (parts of) a full array#
There are also many functions that act on a full array:
a = np.ones((10,), dtype=int)
b = np.array([[1,1],
[2,2]], dtype=float)
print('Sum of all elements of a:', np.sum(a))
print('Average of all elements of a:', np.mean(a))
print('Sum of all elements of b:', np.sum(b))
print('Sum of all elements in a row of b:', np.sum(b, axis=1))
print('Largest element of b:', np.max(b))
print('Position of largest element in b (refers to flattened array):',
np.argmax(b))
Mini-exercises#
Make a 30x30 matrix with the value 2 on the diagonal and otherwise 0
Use numpy to compute the matrix-vector multiplication
You have three vectors
Store these three vectors as the rows of a matrix. Then use numpy element-wise operations to compute the length of each vector (for a vector \(\mathbf{a}\) the length is given as \(\sqrt{\sum_i a_i^2}\)). The result should be a vector with containing the different lengths.
Solution#
Show code cell content
a = np.identity(30) * 2
print(a)
Show code cell content
M = np.array([[1, 2],
[3, 4]])
v = np.array([1, 2])
print(M @ v)
Show code cell content
M = np.array([[1, 2, 3],
[0, 0, 1],
[1, 1, 1]])
lengths = np.sqrt(np.sum(M**2, axis=1))
print(lengths)
Saving and loading data#
Numpy includes facilities for saving arrays to disk and loading them back in again:
a = np.arange(0, 10)
np.save('my_array.npy', a)
b = np.load('my_array.npy')
# check the arrays are equal
print(np.all(a == b)) # np.all() needed, as a == b compares *element-wise*
We can also save the array in a text format (rather than a binary one) so that we can easily parse the data using other tools, or inspect it manually with a text editor:
np.savetxt('my_array.txt', a)
b = np.loadtxt('my_array.txt')
print(np.all(a == b))
see also: https://docs.scipy.org/doc/numpy/reference/routines.io.html
If you need to store and manipulate massive datasets that cannot fit into memory at once, consider looking at h5py.
A note on efficiency#
In addition to being more intuitive for manipulating numeric data, Numpy arrays are more efficient than using lists.
let us be more concrete:
N = int(1E6)
list_a = list(range(N))
list_b = [None] * N # create a list with N entries, each one equal to `None`
array_a = np.array(list_a)
array_b = np.zeros_like(array_a)
%%timeit
# using lists
for i in range(N):
list_b[i] = 2 * list_a[i] + 1
Here we use %%timeit
, a built-in magic command, to time the cell’s code execution.
%%timeit
# using arrays in a naive way -- don't do this!
for i in range(N):
array_b[i] = 2 * array_a[i] + 1
%%timeit
# using arrays properly
array_b = 2 * array_a + 1
If we use numpy arrays properly, we see a speedup of 40x compared with the naive list example.
Note however that if we do the direct naive loop with Numpy arrays then the code actually runs slower than the list example.
This should serve to illustrate that explicitly looping over Numpy arrays is a bad idea. If you find yourself doing this, stop and try to see if there is a more “numpy” way of expressing the operation.
Advanced topics#
Numpy internals https://arxiv.org/pdf/1102.1523v1.pdf
Array broadcasting https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
Fancy indexing https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
Exercises#
Exercise#
Create a zero vector of size 10, but the fifth value which is 1
Solution#
Show code cell content
v = np.zeros((10,))
v[4] = 1 # zero-based indexing!
v
Solution#
Show code cell content
v = np.array([1, 2, 3, 4])
v[::-1]
Exercise#
Create a 8x8 matrix and fill it with a checkerboard pattern (alternating 1’s and 0’s)
Solution#
Show code cell content
m = np.zeros((8, 8))
m[::2, 1::2] = 1
m[1::2, ::2] = 1
m
Exercise#
Create a 2D array with 1 on the border and 0 inside
Solution#
Show code cell content
m = np.zeros((4, 5))
m[0, :] = 1
m[:, 0] = 1
m[-1, :] = 1
m[:, -1] = 1
m
Exercise#
Construct a random vector of 6 elements (maybe np.random
would be a good starting point?)
Find the norm of this vector
Find the minimum element of this vector
Solution#
Show code cell content
v = np.random.random_sample(6)
print('norm:', np.sqrt(np.sum(v**2)))
print('min:', np.min(v))
Exercise#
Construct a random 10x2 matrix representing 10 cartesian coordinate vectors.
Convert them to polar coordinates.
remember:
Find the vector with the smallest radial coordinate
Solution#
Show code cell content
v = np.random.random_sample((10, 2))
v_polar = np.zeros_like(v)
v_polar[:, 0] = np.sqrt(v[:, 0]**2 + v[:, 1]**2)
v_polar[:, 1] = np.arctan2(v[:, 1], v[:, 0])
print(v_polar)
min_r = np.argmin(v_polar[:, 0], axis=0)
v[min_r]
Exercise: numerical integration#
(This exercise is a bit tougher.)
The trapezium rule is a scheme for approximating an integral:
Use numpy to estimate the integral of \(\frac{1}{\sqrt{π}}\exp(-x^2)\) between -10 and 10 using 50 points with the trapezium rule.
hint: use element-wise functions and slicing
Now try with 10 million points (this should run in less than 2 seconds; if it doesn’t there’s something wrong!
Solution#
Show code cell content
x, dx = np.linspace(-10, 10, int(1E7), retstep=True)
f = np.exp(-x**2) / np.sqrt(np.pi)
(dx / 2) * (f[0] + f[-1] + 2 * np.sum(f[1:-1]))
Extra exercises#
The list of 100 numpy exercises goes from basic level to expert.
matplotlib
: plotting primitives#
Matplotlib is the most widely-used plotting library for Python. To use
it, we should first import
it, like we did for Numpy:
import matplotlib.pyplot as plt
%matplotlib widget
Note that this time we actually directly import a sub-package of Matplotlib called pyplot
, which gives us an easier plotting interface similar to Matlab.
The line %matplotlib widget
makes the plots appear in the browser (otherwise an extra window would open).
Simple line plots#
We can generate some data and plot it like so:
x = np.linspace(0, 2 * np.pi, 101)
y = np.sin(x)
fig, axes = plt.subplots()
axes.plot(x, y)
We can also plot several things in the same figure by calling plt.plot
multiple times
fig, axes = plt.subplots()
axes.plot(x, np.sin(x))
axes.plot(x, np.cos(x))
By default Matplotlib will cycle through line colors in a pre-defined order, however you can also set the style of each line explicitly:
fig, axes = plt.subplots()
axes.plot(x, np.sin(x), color='red', linewidth=2, linestyle='--')
axes.plot(x, np.cos(x), color='black', linewidth=2)
plt.plot
has many other parmeters that can be used to carefully tune the properties of the lines. The documentation can be found here, but be warned, it is very dense!
Adding labels and legends#
pyplot
also contains other functions that allow you to customize your figures. Let’s make the same plot as above, but add some axes labels and a legend:
x = np.linspace(0, 2 * np.pi, 1000)
fig, axes = plt.subplots()
axes.plot(x, np.sin(x), color='r', linewidth=2, linestyle='--', label='sine')
axes.plot(x, np.cos(x), color='k', linewidth=2, label='cosine')
# we can use Latex math between $ signs!
axes.set_xlabel(r'Time ($\tau$)', size=15)
axes.set_ylabel(r'Amplitude', size=15)
axes.set_title('Some trig functions', size=20)
# make the axis tick labels larger
axes.tick_params(labelsize=15)
# set lower/upper limits on x axis
axes.set_xlim(0, 2 * np.pi)
# loc=3 means put the legend in lower left corner:
# 3 is a magic number given in the documentation
axes.legend(loc=3)
ProTip
As you may have already figured out, while Matplotlib in principle provides a very complete interface where every parameter can be tweaked, the user interface is verbose, and often a lot of tweaking is necessary to get your plots looking how you like.
To avoid having to set the same options repeatedly you can tell matplotlib to globally set some options, as described in the docs: http://matplotlib.org/users/customizing.html.
Alternatively you could use a higher-level frontend for Matplotlib such as seaborn.
Other kinds of plots#
Scatter plot with error bars#
Many of you will want to plot experimental data, where you want to see the individual data points. The below example shows how to achieve this:
n_samples = 50
x = np.linspace(0, 1, n_samples)
sigma = 0.08
# signal + disorder
y = x + np.random.normal(loc=0, scale=sigma, size=n_samples)
fig, axes = plt.subplots()
axes.scatter(x, y)
# put error bars on the points, but put no lines between the errorbars
axes.errorbar(x, y, yerr=sigma, ecolor='b', elinewidth=2, linestyle='')
axes.set_xlim(0, 1)
axes.set_xlabel('Voltage [$V$]', size=15)
axes.set_ylabel('Current [$A$]', size=15)
axes.set_title('Totally real I-V curve', size=20)
axes.tick_params(labelsize=15)
Histograms#
Histograms can be created using the plt.hist
function:
n_samples = int(1E4)
sigma = 10
mu = 10
samples = np.random.normal(loc=mu, scale=sigma, size=n_samples)
fig, axes = plt.subplots()
axes.hist(samples, bins=50)
axes.set_xlabel('Measured value', size=15)
axes.set_ylabel('Frequency', size=15)
Colorplots#
You can make colorplots in matplotlib by creating an matrix of values and passing it to the plt.imshow
function:
def sinc(x, y):
r = np.sqrt(x**2 + y**2)
return np.sin(r) / r
x = np.linspace(-20, 20, 100)
y = np.linspace(-20, 20, 100)
# compute sinc on a grid of points using numpy broadcasting
z = sinc(x.reshape(100, 1), y.reshape(1, 100))
print('array to plot has shape:', z.shape)
fig, axes = plt.subplots()
# need to pass the "extent" parameter to have the
# correct x and y axes
im = axes.imshow(z, extent=(x[0], x[-1], y[0], y[-1]),
cmap="viridis")
fig.colorbar(im) # show a color scale next to the figure
Saving plots#
Once we have created a plot we can also save them to disk using plt.savefig
:
x = np.linspace(0, 2 * np.pi, 101)
y = np.sin(x)
fig, axes = plt.subplots()
axes.plot(x, y)
# matplotlib supports saving to several image formats, including png, svg and pdf
# just put a different extension on to the filename
fig.savefig('simple_plot.png')
ProTip
If you want static figures inside the notebook, rather than interactive widgets, use
%matplotlib inline
instead of
%matplotlib widget
Advanced topics#
Interactive plots with Holoviews#
While Matplotlib provides reasonable plotting, it is not particularly well-adapted to interactive exploration of data sets.
The Holoviews library provides more features for interactively exploring your data.
First we import holoviews and tell it that we want it to display its output in the browser:
import holoviews as hv
hv.notebook_extension('matplotlib')
Now we can plot our sinc
function from before, and also slice the data along an axis,
with the position of the axis controlled by a slider widget:
x = np.linspace(-20, 20, 100)
y = np.linspace(-20, 20, 100)
z = sinc(x.reshape(100, 1), y.reshape(1, 100))
sinc_image = hv.Image(z, bounds=(x[0], y[0], x[-1], y[-1]))
hv.HoloMap({y: sinc_image * hv.HLine(y=y) + sinc_image.sample(y=y)
for y in np.arange(-19, 20)}, kdims=['Y']).collate().cols(2)
scipy
: tools for scientific computing#
So now we know how to contain our data (Numpy arrays) and plot it (Matplotlib), however we still need some tools that will enable us to do some more advanced data analysis and/or computations to generate data.
Scipy is the core library that provides these more advanced capabilites. In addition there is an ecosystem of so-called “SciKits” that build on Scipy to provide functionality adapted to more specific problem domains.
Here, we will look at some of the key functionality that core Scipy provides.
You should not attempt to do all these exercises
Pick a couple of modules that are of interest to you and try the exercises for those. There is clearly no point in learning how to use scipy
’s differential equation solver if you’re never going to solve a differential equation in your research!
Dense linear algebra#
We already saw that we can do basic linear algebra operations with Numpy (adding vectors and matrices, dot products,
matrix-vector product…), however Scipy provides functionality for more involved algorithms in scipy.linalg
(reference documentation)
Exercises#
create a 100x100 random matrix
find the eigenvalues and eigenvectors numerically
verify that the answer is correct (remember \(M x = \lambda x\) for a matrix \(M\) with eigenvalue \(\lambda\) and eigenvalue \(x\))
create a 100x100 random matrix \(M\) and a random vector \(b\) with 100 entries
solve the linear system \(M x = b\) for the vector \(x\)
verify that the answer is correct
Solution#
Show code cell content
import scipy.linalg as la
M = np.random.rand(100, 100) # random 3x3 matrix
evals, evecs = la.eig(M) # see the docs for the format of the eigenvalues / vectors
assert np.allclose(M.dot(evecs), evals * evecs)
b = np.random.rand(100,)
x = la.solve(M, b)
assert np.allclose(M.dot(x), b)
Sparse linear algebra#
Sometimes you may be dealing with very sparse matrices/vectors, where many of the entries are zero. In such cases
we can use special data structures to avoid storing all of the (redundant) zeros. In addition there are different
algorithms that can be used to do things like compute eigenvalues of sparse matrices, or solve linear systems involving sparse matrices. The corresponding functions are found in scipy.sparse
and scipy.sparse.linalg
(reference documentation)
Exercises#
create a 1000 x 1000 tridiagonal matrix with +2 on the diagonal and -1 on the upper/lower diagonals.
Find the 3 eigenvectors with the lowest eigenvalues using sparse linear algebra functions (hint: take advantage of the structure of the matrix to select the best routine for the job)
create a 1000 element vector \(v\) with 0 in all entries and 1 in the first entry
using the same 1000x1000 matrix as before, solve the linear system \(Mx = b\) using sparse linear algebra functions
Solution#
Show code cell content
import scipy.sparse as spa # for sparse matrices
import scipy.sparse.linalg as sla # for sparse linear algebra routines
diagonals = -np.ones((3, 1000))
diagonals[0, :] = 2
offsets = (0, -1, 1)
A = spa.dia_matrix((diagonals, offsets), shape=(1000, 1000)).tocsc()
sla.eigsh(A, k=3, which='SM')
# solving linear systems M.x = b
b = np.zeros(1000)
b[0] = 1
x = sla.spsolve(A, b)
Numerical quadrature#
Scipy also wraps the venerable quadpack
library to provide numerical evaluation of definite integrals in scipy.integrate
(reference documentation)
Exercises#
estimate the integral of \(\exp(-x^2)\) between 0 and 1
estimate the integral of \(\exp(-x^2)\) between \(-\infty\) and \(\infty\)
Solution#
Show code cell content
import scipy.integrate as spi
def normal_dist(x):
return np.exp(-x**2) / np.sqrt(np.pi)
# definite integral between 0 and 1
result, error = spi.quad(normal_dist, 0, 1)
print(result)
# can even handle improper integrals
result, error = spi.quad(normal_dist, -np.inf, np.inf)
print(result)
Ordinary differential equations#
The scipy.integrate
package also includes facilities for solving ordinary differential equations.
By default a hybrid algorithm that can handle both stiff and non-stiff problems is used, but you can select which
method to use as well as tuning the method parameters.
Exercises#
Simple pendulum#
a simple pendulum is defined by the system of equations
where \(\alpha\) is a model parameter
create a function
pendulum
that takes a 2-vector \((\theta, \omega)\) and a time, and returns \((\dot{\theta}, \dot{\omega})\)start the pendulum in a state \((\theta_0, 0)\) with \(\alpha = 1\) and evolve it forward in time until \(t=20\)
plot \(\theta\) as a function of time
graphically compare the result with the small-angle result \(\theta(t) = \theta_0 \cos(\sqrt{\alpha}t)\)
Solution#
Show code cell content
alpha = 1
# function describing a simple pendulum system.
# takes the current state of the system (and the current time) and returns
# the time derivative.
def simple_pendulum(state, time):
theta = state[0]
omega = state[1]
# dθ/dt = ω, dω/dt = -sin(θ)
dtheta_dt = omega
domega_dt = -alpha * np.sin(theta)
return np.array([dtheta_dt, domega_dt])
# start the pendulum with some angle
theta_0 = np.pi / 100
omega_0 = 0
state_0 = np.array([theta_0, omega_0])
# what times do we want the solution?
times = np.arange(0, 20, 0.1)
# start in state_0 and find the solution for all times in `times`
solutions = spi.odeint(simple_pendulum, state_0, times)
small_angle_approx = theta_0 * np.cos(np.sqrt(alpha) * times)
# compare the numerical results and the small-angle approximation
fig, axes = plt.subplots()
axes.plot(times, solutions[:, 0], label='numerical soltn.')
axes.plot(times, small_angle_approx, linestyle='--', label='small θ approx.')
axes.set_xlabel('time')
axes.set_ylabel('θ')
axes.legend(loc=4)
Function optimization#
Scipy includes a plethora of routines implementing algorithms for: curve fitting, minimum finding, root finding etc. in scipy.optimize
(reference documentation)
Exercises#
Curve fitting#
load the data in “
curve_fitting_data.npy
” into a Numpy arraythe array has shape
(2, 50)
where the first row is the independent variable, and the second row is the dependent variable
plot the data using a scatter plot
the data has been produced according to a model of the form \(f(x) = \alpha x \exp(-\beta x)\) (plus some noise) where \(\alpha\) and \(\beta\) are the model parameters and \(x\) is the independent variable. Estimate the model parameters
plot the estimated model on top of the data
Solution#
Show code cell content
import scipy.optimize as spo
### curve fitting
x, data = np.load('curve_fitting_data.npy')
# x is the independent variable, and the rest of the arguments are the model parameters
def model(x, alpha, beta):
return alpha * x * np.exp(-beta * x)
# returns optimal values for parameters, and estimated covariance
params, params_cov = spo.curve_fit(model, x, data)
estimated_alpha, estimated_beta = params
fig, axes = plt.subplots()
axes.scatter(x, data)
axes.plot(x, model(x, estimated_alpha, estimated_beta), label='estimated model')
axes.legend()
### 1D minimization
def minimize_this(x):
return -x * np.exp(-x)
result = spo.minimize_scalar(minimize_this)
print('result:', result.x, 'expected:', 1)
### root finding
def find_my_roots(x):
return (x - 3)**2
# use newton's method
print(spo.newton(find_my_roots, x0=0))
Fourier Transforms#
Scipy also wraps the venerable FFTPACK
library for computing 1D and 2D Fourier transforms using the fast fourier transform (FFT) algorithm in scipy.fftpack
(reference documentation)
Exercises#
1D transforms#
create a function that evaluates \(g(x) = cos(2\pi f_0 x) + cos(2\pi f_1 x)\)
create an array that samples the function using 200 points on the interval [0, 1]
find the Fourier transform of this sample
find out what format the transformed array is in
plot the power spectrum of the sample as a function of frequency
Solution#
Show code cell content
import scipy.fftpack as fftpack
n_samples = 200
x, x_step = np.linspace(0, 1, n_samples, retstep=True)
# sum of a couple of cosines with different frequencies
y = np.cos(50 * 2 * np.pi * x) + np.cos(80 * 2 * np.pi * x)
# the first half of the array contains the positive frequency components
# the second half is the negative frequencies
y_fft = fftpack.fft(y)
# instead of trying to map things correctly ourselves,
# there is a handy function to get the frequencies too
freqs = fftpack.fftfreq(n_samples, x_step)
power_spectrum = np.abs(y_fft)**2 / n_samples
fig, axes = plt.subplots()
axes.plot(freqs, power_spectrum)
Show code cell content
### 2D fourier transforms
N = 30
y_fft = np.zeros((N, N))
y_fft[5, 10] = 1
y_fft[-5, -10] = 1
y = fftpack.ifftn(y_fft) # inverse Fourier transform in N dimensions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 10))
ax1.imshow(y_fft, cmap="gray")
ax2.imshow(y.real, cmap="gray")
Image manipulation#
As we saw in the Matplotlib section, images are just 2D arrays. scipy.ndimage
(reference documentation) contains algorithms for manipulating images, applying filters etc.
Loading images from files requires the imageio
module (which is not a part of scipy).
Exercises#
Simple filters#
create a 2D numpy array representing a black square (pixel value 1) on a white background (pixel value 0)
show it with
pyplot.imshow
rotate the square by 45 degrees
apply a sobel (gradient) filter to show the edges in the image
Feature identification#
load the image “blobs.png” into a Numpy array
identify (label) the distinct blobs in the image
find the “centre of mass” of each of the blobs
plot the blobs and mark their centres of mass
estimate the area (in pixels) of each of the blobs
(note that each of these tasks is not as difficult as it first appears)
Solution#
Show code cell content
import scipy.ndimage as ndimage
import imageio
### applying filters
N = 100
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 20))
# make a simple square
square = np.zeros((N, N))
square[N // 4: -N // 4, N // 4: -N // 4] = 1
ax1.imshow(square, cmap="gray")
# rotate the square by 45 degrees
square = ndimage.rotate(square, 45, reshape=False)
ax2.imshow(square, cmap="gray")
# apply a sobel filter (gradient)
sx = ndimage.sobel(square, axis=0) # gradient in x direction
sy = ndimage.sobel(square, axis=1) # gradient in y direction
sob = np.sqrt(sx**2 + sy**2) # square magnitude of gradient
ax3.imshow(sob, cmap="gray")
Show code cell content
### labelling features in an image
blobs = imageio.imread('blobs.png', pilmode='L')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 15))
ax1.imshow(blobs, cmap='gray')
ax1.set_title('blobs', size=20)
labeled_blobs, n_features = ndimage.label(blobs)
ax2.imshow(labeled_blobs, cmap='viridis')
ax2.set_title('labelled blobs', size=20)
# calculate the center of mass of each labelled feature
centers = ndimage.center_of_mass(blobs, labeled_blobs, np.arange(n_features) + 1)
# mark the centers of mass on the image
x, y = zip(*centers) # unzip the pairs (x, y) into two lists
ax2.scatter(y, x) # invert the coordinates, as we are talking about row/column indices now
Problem solving with Python#
In the context of your daily research you will often come up against problems that can’t be solved directly using the above-described tools.
In this situation it is good to know how you can approach problems. In practice you will often find that the following crib sheet will be satisfactory in a surprising number of cases:
Algorithm for problem solving#
identify the key words of your problem, append “Python” and stick it into google
see if there are code snippets that do approximately what you want:
StackOverflow is a good source of these
random technical blogs are also not bad if they are highly ranked by google
if the snippet uses a package that you do not have installed:
copy/paste a code snippet that looks promising
get a minimum working example which you can then modify and adapt (starting from scratch is hard)
iteratively improve on the minimum working example until your program does what you want
Points to remember#
It’s OK not to understand everything straight away
Skim read resources (documentation, installation instructions…) while stuff works. When stuff doesn’t work anymore, be ready to re-read these resources in more detail
Google is your friend. So is StackOverflow
Caveats#
Not all examples/snippets are up to date, reliable, or written by someone who has the slightest clue what they are doing
Often the accepted answer on StackOverflow is good enough for a first iteration, although pay attention to the responder’s “reputation” to gauge how reliable the answer may be.
sometimes you may come across funny answers too…
Scenario#
Your predecessor has left you some experimental data, in an excel spreadsheet. You want to plot and analyze the data in ways that are not provided by Excel; what do you do?
Solution#
Copy and paste the data from the excel worksheet to a text file.
Import the data using
numpy.loadtxt
Analyze and plot
This may seem contrived, but often the simplest solution is the best.
Scenario#
You have data in an Excel spreadsheet “current_voltage_data.xlsx
”, but it is routinely updated by colleagues. In addition, your supervisor
wants to have not only the plots you produce, but also the data after you have run your analysis on it,
and she wants it in an Excel spreadsheet.
Solution#
Google the string: “
data analysis excel python
”First hit is a video tutorial that mentions “pandas”
Google “pandas python excel”
First hit is documentation from the Pandas website about a function “
read_excel
”; seems promising!Install “pandas” with “
pip install pandas
”Try to use
"read_excel"
(it will fail complaining aboutopenpyxl
)Install
openpyxl
with pipImport data
Google “scipy linear regression”
copy snippet from website
google for “pandas data to numpy array”
calculate linear regression
plot
google for how to output data to excel
try to use “to_excel”
save to Excel format
import pandas
import scipy.stats as stats
# read in and clean up data
data_frame = pandas.read_excel('current_voltage_data.xlsx')
cleaned_data = data_frame.dropna() # remove undefined data entries
cleaned_data_matrix = cleaned_data.values # get data as numpy array
# perform linear regression
linregress = stats.linregress(cleaned_data_matrix)
# plot the data
x, y = cleaned_data_matrix.transpose()
fig, axes = plt.subplots()
axes.scatter(x, y)
axes.plot(x, linregress.slope * x + linregress.intercept, linewidth=2, color='r')
cleaned_data.to_excel('output.xlsx', index=False)
Extended Exercises#
These exercises require you to use functionality from several of the libraries introduced above
Explore!#
At the start of the notebook there were just a few examples of scientific Python packages for treating different problem domains.
Find a Python package that you think could be useful in your research. Try installing it and testing out some of the usage examples
Animal Statistics#
The file in populations.txt
describes the populations of hares and lynxes (and carrots) in nothern Canada over the course of 20 years.
Tasks#
load and plot the data in
populations.txt
. Don’t forget to label your axes and produce a legend!Print the mean and standard deviation of the population for each species
Print which species has the largest population for each year (hint:
np.argmax
)Print for which years the populations of any of the species above 50000
For each species, print the 2 years when their population was the lowest
Plot the change in the hare population and the number of lynxes over time
Calculate the correlation coefficient between the change in the hare population and the number of lynxes.
Solution#
Show code cell content
data = np.loadtxt('populations.txt')
years, hares, lynxes, carrots = data.T # trick: columns to variables
populations = data.T[1:]
species = ('Hare', 'Lynx', 'Carrot')
fig, axes = plt.subplots()
axes.plot(years, hares, years, lynxes, years, carrots)
axes.set_xlabel('year')
axes.set_ylabel('population')
axes.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
means = np.mean(populations, axis=1)
std_devs = np.std(populations, axis=1)
for organism, mean, std in zip(species, means, std_devs):
print(organism, '-- population mean:', mean, ' -- population std dev:', std)
max_idxs = np.argmax(populations, axis=0)
for year, i in zip(years, max_idxs):
print('year', int(year), 'had mostly: ', species[i])
which_species, which_years = np.argwhere(populations > 50000).T
print('the following years had > 50000 of an organism:', *years[which_years])
Show code cell content
fig, axes = plt.subplots()
axes.plot(years[1:], np.diff(hares), years, lynxes)
axes.set_xlabel('year')
axes.set_ylabel('population')
axes.legend(('Change in Hares', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
least_populous_idxs = np.argsort(populations, axis=1)[:, :2]
for organism, least_populous_years in zip(species, years[least_populous_idxs]):
print(organism, 'was least populous in years: ', *least_populous_years)
print('The correlation coefficient between the lynx population and the change in the Hare population is',
np.corrcoef(np.diff(hares), lynxes[1:])[1, 0])
Mandelbrot set#
The Mandelbrot set is the set of complex numbers \(c\) for which the function \(f_c ( z ) = z^2 + c\) does not diverge when iterated from \(z = 0\).
In this exercise we will sample and vizualize the Mandelbrot set using Numpy and Matplotlib.
Tasks#
write a function
mandelbrot
that takes a complex value \(c\), evaluates \(f_c\) 100 times starting from \(0\), and returns the absolute value of the resultevaluate your function on a grid of complex numbers (hint: use numpy operations and writing
c = x + 1j * y
wherex
andy
are real)plot the result using
pyplot.imshow
improve your
mandelbrot
function so as to remove theRuntimeWarning
that you get when evaluating it at points that are not in the Mandelbrot set (what is happening here?)
Harder tasks#
make a visualisation that allows interactive zooming
make the visualisation re-evaluate the points in the mandelbrot set when the user zooms (so that more structure is revealed by zooming)
do you think you can speed up your solution?
use Cython to rewrite your
mandelbrot
function. Take advantage of multiple CPU cores when evaluating the function by using prange. You can use the Cython cell magic to auto-compile your code from within the notebook.
Solution#
Show code cell content
import numpy as np
def mandelbrot(c):
z = 0
for i in range(100):
z = z**2 + c
return np.abs(z)
x = np.linspace(-2, 1, 500)
y = np.linspace(-1, 1, 500)
M = x[None, :] + 1j * y[:, None]
R = mandelbrot(M)
fig, axes = plt.subplots()
axes.imshow(R)
Show code cell content
%load_ext Cython
Show code cell content
%%cython --compile-args=-fopenmp --link-args=-fopenmp --annotate
import numpy as np
from cython.parallel cimport prange
from cython cimport boundscheck
cdef double mandelbrot_inner(complex c) nogil:
cdef complex z = 0
for i in range(100):
z = z * z + c
return z.real * z.real + z.imag * z.imag
@boundscheck(False)
def mandelbrot_cython(points):
cdef complex[:] linear_points = points.reshape(-1)
cdef long i, size = linear_points.shape[0]
cdef double[:] result = np.empty((linear_points.shape[0],), dtype=float)
for i in prange(size, nogil=True):
result[i] = mandelbrot_inner(linear_points[i])
return np.asarray(result).reshape(points.shape)
Show code cell content
x = np.linspace(-2, 1, 1000)
y = np.linspace(-1, 1, 1000)
M = x[None, :] + 1j * y[:, None]
Show code cell content
%timeit mandelbrot(M)
Show code cell content
%timeit mandelbrot_cython(M)
Lorenz attractor#
The Lorenz system is a system of ordinary differential equations that displays chaotic solutions for certain parameter values.
The system is defined by
where \((x, y, z)\) are the coordinates, and \(\beta\), \(\sigma\), and \(\rho\) are parameters.
In this exercise we will numerically solve the Lorenz system, visualize the solution with Matplotlib, and explore the parameter space
Tasks#
write a function,
lorenz
, that takes a 3-vector of coordinates \((x, y, z)\) and the current time, and returns a 3-vector \((\dot{x}, \dot{y}, \dot{z})\) (use the values \(\beta=8/3\), \(\sigma=10\), and \(\rho=28\) for now)use
scipy.integrate.odeint
with yourlorenz
function to solve the lorenz systemplot the trajectory projected onto the \(x-y\), \(y-z\) and \(x-z\) planes, respectively
this system shows sensitivity to initial conditions; estimate the Lyapunov exponent of the system.
Harder Tasks#
create a 3D visualization instead of the 3 projected visualizations
estimate the positions of the “attractor” points
Solution#
Show code cell content
import scipy.integrate as spi
beta = 8/3
sigma = 10
rho = 10
def lorenz(state, t):
x, y, z = state
deriv = np.empty((3,))
deriv[0] = sigma * (y - x)
deriv[1] = x * (rho - z) - y
deriv[2] = x * y - beta * z
return deriv
times = np.linspace(0, 50, 10000)
start = np.ones((3,))
result = spi.odeint(lorenz, start, times)
x, y, z = result.transpose()
fig, axes = plt.subplots()
axes.plot(x, z)
Further reading#
http://www.scipy-lectures.org/index.html – a more in-depth and systematic version of this class