Exercises

Setup

To kit up, launch IPython:

ipython

and import Numpy:

In [1]: import numpy as np

Tune how it prints arrays (easier for the eyes):

In [2]: np.set_printoptions(precision=3)

Warming up

Exercise 1: Warming up

  1. Create a 5x6 Numpy array containing random numbers in range [0, 1].

    • Compute the mean of all the numbers in it

      (To find the function to do this: np.lookfor("mean of array"))

    • Compute the minimum value in each row, and maximum in each column

    • Multiply each element by 10 and convert to an integer with the .astype() method.

      What is the difference between a.astype(int) and np.around(a)?

  2. Compare:

    np.array([1, 2, 3, 4]) / 2
    np.array([1.0, 2, 3, 4]) / 2
    np.array([1, 2, 3, 4]) // 2
    np.array([1.0, 2, 3, 4]) // 2
    

    Why does it work like it does? How about with:

    a = np.array([1, 2, 3, 4], dtype=float)
    b = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.int8)
    
  3. Which of the following operations create a view, and which a copy:

    a = np.array([[1, 2, 3], [4, 5, 6]])
    
    a[:,[0,1]]
    a[:,0:2]
    a[0]
    a.T
    a[[True, False]]
    
    a.reshape(2*3)            # bonus sector
    a.T.reshape(2*3)          # bonus sector
    

    (Think first, then check.)

  4. Use the function:

    def change_it(x):
        x[:] = np.array([7, 8, 9])
    

    to change array:

    a = np.array([1, 2, 3, 4, 5, 6])
    

    to:

    array([1, 7, 8, 9, 5, 6])
    

Exercise 2: Strides

  1. Consider:

    a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int16)
    

    What do the following operations do, and what are the resulting strides:

    a
    a.T
    a[::-1]
    
  2. Study the .strides, .flags, and str(a.data) attributes of the arrays:

    a = np.array([[1, 2], [3, 4]], dtype=np.byte)
    b = a.T
    

    Which of the above are C-contiguous (and what does that mean)?

Broadcasting

Exercise 3: Operating along an axis

Divide each column of the array

>>> a = np.arange(25).reshape(5, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

elementwise with the array b = np.array([1.5, 5, 10, 15, 20]).

I.e., the result should be:

array([[ 0/1.5,  1/1.5, ...],
       [ 5/5,    6/5,   ...],
       [10/10,  11/10,  ...],
       [15/15,  16/15,  ...],
       [20/20,  21/20,  ...]])

Tips

  • np.newaxis

Exercise 4: Integral approximation

Write a function f(a, b, c) that returns a^b - c. Generate a shape (24, 12, 6) array containing the values f(a_i, b_j, c_k) at points a_i, b_j and c_k forming a grid in the unit cube [0, 1] x [0, 1] x [0, 1].

Approximate the 3-d integral

\int_0^1\int_0^1\int_0^1(a^b-c)da\,db\,dc

over this volume with the mean of the values. The exact result is: \log(2) - \frac{1}{2} — how close do you get?

Try also using np.mgrid instead of broadcasting. Is there a speed difference? How about ogrid with broadcast_arrays?

Tips

  • You can make np.ogrid give a number of points in given ranges with the syntax np.ogrid[a:b:20j, c:d:10j].
  • You can use %timeit in IPython to check timings

Fancy indexing

Exercise 5: Picking up

  1. Extract the 1st superdiagonal 1, 7, 14 from the array:

    0   1  2  3
    5   6  7  8
    11 12 13 14
    15 16 17 18
    19 20 21 22

    Then extract the 1st and the 3rd columns.

  2. Generate a 10 x 3 array of random numbers (in range [0,1]). From each row, pick the number closest to 0.75.

Tips

  • Make use of np.abs and np.argmax to find the column j closest for each row.
  • Use fancy integer indexing to extract the numbers. Remember that in a[i,j] the index array i must correspond to j.

Structured data types

Exercise 6: Basic handling

Design a structured data type suitable for the data (in words.txt):

% rank          lemma (10 letters max)      frequency       dispersion
21              they                        1865844         0.96
42              her                         969591          0.91
49              as                          829018          0.95
7               to                          6332195         0.98
63              take                        670745          0.97
14              you                         3085642         0.92
35              go                          1151045         0.93
56              think                       772787          0.91
28              not                         1638883         0.98

Load the data from the text file. Examine the data you got, for example: extract words only, extract the 3rd row, print all words with rank < 30.

Sort the data according to frequency. Save the result to a Numpy data file sorted.npz with np.savez and load back with np.load. Do you get back what you put in?

Save the result to a text file sorted.txt using np.savetxt. Here, you need to provide a fmt argument to savetxt.

Tips

  • See the documentation of the .sort() method: help(np.ndarray.sort)

  • For structured arrays, savetxt needs a fmt argument that tells it what to do.

    fmt is a string. For example "%s %d %g" tells that the first field is to be formatted as a string, the second as an integer, and the third as a float.

Exercise 7: Reading binary files

The .wav audio files are binary files: they contain a fixed-size header followed by raw sound data.

Construct a Numpy structured data type describing the .wav file header, and use it to read the header. Print for example the sample rate and number of channels. (A test.wav is provided so you can try things out on that.)

Tips

  • You can read a binary structure described by some_dtype to a Numpy array with:

    with open('test.wav', 'rb') as f:
        data = np.fromfile(f, dtype=some_dtype, count=1)
    

.wav file structure

Byte # Field  
0 chunk_id 4-byte string ("RIFF")
4 chunk_size 4-byte uint (little-endian)
8 format 4-byte string ("WAVE")
12 fmt_id 4-byte string ("fmt ")
16 fmt_size 4-byte uint (little-endian)
20 audio_fmt 2-byte uint (little-endian)
22 num_channels 2-byte uint (little-endian)
24 sample_rate 4-byte uint (little-endian)
28 byte_rate 4-byte uint (little-endian)
32 block_align 2-byte uint (little-endian)
34 bits_per_sample 2-byte uint (little-endian)
36 data_id 4-byte string ("data")
40 data_size 4-byte uint (little-endian)
  • data_size bytes of actual sound data follow

Advanced

Exercise A: Indexing

Reimplement array indexing (for 2-D, without using Numpy)! Write a function data_at_index(indices, data, strides, dtype) that returns the data corresponding to a specified array element, as a string of bytes. I.e.:

a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int16)
b = a.T

data_at_index([0, 1], str(a.data), a.strides, a.dtype) == '\x02\x00' == str(a[0,1].data)
data_at_index([0, 1], str(b.data), b.strides, b.dtype) == '\x04\x00' == str(b[0,1].data)

Check first that you understand the meaning of:

  • the strides and data attributes of Numpy arrays
  • the type and itemsize attributes of the data type objects

Exercise B: Sliding window

  1. Build a sliding 3-item window for the array:

    x = np.arange(10, dtype=np.int32)
    

The aim is to get an array:

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]], dtype=int32)

without making copies (so that it is fast). The trick is a stride trick:

from numpy.lib.stride_tricks import as_strided
strides = ...
y = as_strided(x, shape=(8, 3), strides=strides)
  1. Use the same trick to compute the 5 x 5 median filter of an image. For each pixel, compute the median of the 5 x 5 block of pixels surrounding it.

    The median filter provides a degree of denoising similarly to a gaussian blur, but it preserves sharp edges better.

    >>> import scipy
    >>> import matplotlib.pyplot as plt
    

    Noisy image

    >>> img = scipy.lena()  # A standard test image for image processing
    >>> img += 0.8 * img.std() * np.random.rand(*img.shape)
    >>> plt.imshow(img)
    

    (Source code)

    _images/exercises-11.png

    Prepare the sliding window

    >>> assert img.flags.c_contiguous   # Important!
    
    >>> window_size = 5
    >>> shape = ...   # Careful, no out-of-bounds access...
    >>> strides = ...
    
    >>> img_window = as_strided(...)
    

    Denoise!

    >>> img_median = np.median(img_window.reshape(..., window_size*window_size), axis=...)
    >>> plt.imshow(img_median)
    >>> plt.gray()
    >>> plt.imsave('sharpened.png', img_median)
    

    Note

    • Above, the .reshape() makes a copy (why?).
    • Scipy has an implementation for the median filter in scipy.ndimage, with more features.

Extra: Visit the factory

We don’t yet have a rolling_window function in Numpy that would make the above easier. We, however, do have a contributed implementation that is discussed here:

Can you extend the version posted by Warren to make N-dimensional windows, or think of any other features such a function would need to have? (If yes, just ask me how to contribute your stuff.)

Exercise C: Menger sponge

_images/wp-menger-sponge.jpg _images/sponge_top.png

Generate an approximation to the Menger sponge by creating a 3-D Numpy array filled with 1, and drilling holes to it with slicing.

Tips:

  • Use dtype np.int8 so you don’t eat all memory
  • Power-of-3 size cube works best, e.g., 81 x 81 x 81
  • You need a function to recurse to drill many levels
  • s = np.s_[i:j] creates a “free” slice object: a[s] == a[i:j].

Take a 2-D slice of the sponge diagonally through the center of the cube, with normal vector (1, 1, 1). What sort of a patterns you get in the intersection?

_images/sponge_cutter.png

Tips (for one approach):

  • Fancy indexing with three 2-D integer arrays can give the slice
  • Boolean mask helps to exclude out-of-bounds indices
  • Vectors u = np.array([0, 1, -1])/1.414 and v = np.array([1, -0.5, -0.5])/1.225 (orthogonal to [1, 1, 1]) can be used as -the basis for the 2-D index arrays.
  • x.astype(int) converts float arrays to integer arrays