padraic.xyz

Performance

The goal of these notes is to examine the performance of your code. To begin with, we introduce the Taxonomy of Parallelization:

  1. SISD: Single Instruction Single Data
  2. SIMD: Single Instruction Multiple Data Same instruction performed in parallel over multiple data inputs. This is necessary in GPU programming, and also likely occurs in OpenMP/MPI.
  3. MIMD: Multiple Instruction Multiple Data Different threads or different nodes doing different things over the different datasets. This would usually happen in e.g. graphical programmes with backing databases, and should be implemented with something like C++11 threads.
  4. MISD: Multiple Instructions Single Data Different threads processing the same data in different ways.

Formatting, Linting and Presentation

Code is read more often than it is written, but you wouldn't know that from how it's often impelmented. Tools exist to fix that. For example, in C++, clang-format Tools like this are 'zero-cost' and improve your quality of life exponentially.

Linters are tools that help you to maintain good style and good practice as you write. In compiled languages, this can be implemented by the compiler itself using a warning flag like -Wall. But programmes like clag-tidy, cppcheck or pylint can be used to identify style and coding errors. They can be easily added to most any editor.

Refactoring is the process of cleaning up code to improve simplicity, legibility, consolidate or even boost performance. To safely refactor, though, we need regression tests.

Memory Leaks

Checking the execution of code is important. One example is with examning memory allocation and accessing, which is perhaps some of the most common errors in C++ coding. Compiles like g++ will have flags like -fsanitize=address, or clang's address sanitize to intrusively examine memory allocation and access and give more detailed error messages.

Alternatively, we can use a tool like Valgrind. To test that, we'll run in Docker, a leightweight virtualisation tool. The notes for setting it up are over at the RITS page. To run docker, we first have to set up a linux virtual machine using docker-machine. This virtual machine is powered by virtualbox. We then spin up a docker container on top of the virtual machine, which is a small self-contained operating system we can configure as needed. We then have to run commands on the container like so:

docker run --rm $(pwd):/tmp/work -w /tmp/work mkdir build
docker run --rm $(pwd):/tmp/work -w /tmp/work/build cmake ../
docker run --rm $(pwd):/tmp/work -w /tmp/work make

The -vflag mounts the path on the virtual machine as a latter path on the container. The -w flag sets our working directory.

The advantage of this is that we can then deploy a given image to any other service with minimal configuration! The sales pitch for docker is that it ends the days of "works on my machine" errors by making entire environments easy to copy and deploy.

Profiling

Admdah's Law is a rule for the theoretical speedup of parallelizing a given task. In particular, given a programe that takes time A and routine A taking time P, parallelising over n threads gives thus you need to measure performance before you try to optimize.

We can do this profiling uses tools like callgrind, gprof, gperftools and XCode Instruments.

Do profile code, we need to compile in Release Mode, with Debug info. This makes sure we're not trying to redevelop compiler optimisations...

Micro-benchmarking

Having profiled your code, you need ways to benchmark snippets of it to verify your optimisation. The simplest examples are test timers but they are not generally the same as other tools. Google benchmark, hayai and others exist that can all be used to fulfill this task. Alternatively we can do it manually using timing functions in our programming lanuage like std::chrono.