Exercises for Memory-Efficient Computing

Optimizing arithmetic expressions

Exercise 1

Use script ``poly1.py`` to check how much time it takes to evaluate the next polynomial:

    y = .25*x**3 + .75*x**2 - 1.5*x - 2

with x in the range [-1, 1], and with 10 millions points.

  • Set the `what` parameter to “numexpr” and take note of the speed-up versus the “numpy” case.
  • Why do you think the speed-up is so large?
Exercise 2

The expression below:

  y = ((.25*x + .75)*x - 1.5)*x - 2

represents the same polynomial than the original one, but with some interesting side-effects in efficiency. Repeat the computation for numpy and numexpr and draw your own conclusions.

  • Why do you think numpy is doing much more efficiently with this new expression?
  • Why the speed-up in numexpr is not so high in comparison?
  • Why numexpr continues to be faster than numpy?
Exercise 3

The C program ``poly.c`` does the same computation than above, but in pure C. Compile it like this:

  gcc -O3 -o poly poly.c -lm

and execute it.

  • Why do you think it is more efficient than the above approaches?

Parallelism with threads

Exercise 4

Be sure that you are on a multi-processor machine and repeat the last computation in poly1.py but increasing the number of threads one by one (change the number in the ``for nt in range(1):`` loop).

  • How the efficiency scales?
  • Why do you think it scales that way?
  • How performance compares with the pure C computation?
Exercise 5

With the same multi-processor, recompile the above poly.c, but with OpenMP support:

  gcc -O3 -o poly poly.c -lm -fopenmp    # notice the new -fopenmp flag!

and execute it for several numbers of threads:

  OMP_NUM_THREADS=desired_number_of_threads ./poly

Compare its performance with the parallel numexpr.

  • How the efficiency scales?
  • Which is the asymptotic limit?
Exercise 6

With the previous examples, compute the expression:

  y = x

That is, do a simple copy of the `x` vector. What's the performance that you are seeing? How does it evolve when using different threads?

Evaluating with carray

Exercise 7

Look into the sources of carray-eval.py and run it. For the first expression evaluation, i.e.:

    ((.25*x + .75)*x - 1.5)*x - 2
  • Why do you think carray evaluates faster than NumPy, even when using the Python VM (virtual machine).
  • How much the compression slows down the evaluation? Which is the compression ratio achieved? Is that a lot?
Exercise 8

Repeat your reasoning with the second expression:

    ((.25*x + .75)*x - 1.5)*x - 2 < 0
  • Why do you think the results vary so dramatically?

Querying Big Data

Exercise 9

Look into the sources of 'carray-ctable.py' script and run it.

  • How a carray query compares with a numpy one?
  • Which is the compression ratio achieved in the ctable `t`?
  • How the different 'simple' and 'complex' query executes in comparison with the NumPy ones?
  • If you are in the big Intel's Lab machine, increase the NROWS by one order of magnitude and re-run the benchmark. What do you see?
Exercise 10

Enter the ipython console and generate the big `t` ctable (just copy and paste the appropriate statements from the previous 'carray-ctable.py').

  • Try to find the sweet spot for the 'simple' query by selecting different number of threads by running:
      ca.set_nthreads(your_number_of_threads)
  • Repeat for the 'complex' query.
  • Why do you think there is such a large different in the sweet spot?
 
starving_cpu_exercises.txt · Last modified: 2012/09/05 10:22 by francesc
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki