Building a Raspberry Pi Cluster

Part III —OpenMPI, Python, and Parallel Jobs

This is Part III in my series on building a small-scale HPC cluster. Be sure to check out Part I and Part II.

In the first two parts, we set up our Pi cluster with the SLURM scheduler and ran some test jobs using R. We also looked at how to schedule many small jobs using SLURM. We also installed software the easy way by running the package manager install command on all of the nodes simultaneously.

In this part, we’re going to set up OpenMPI, install Python the “better” way, and take a look at running some jobs in parallel to make use of the multiple cluster nodes.

Part 1: Installing OpenMPI

OpenMPI is an open-source implementation of the Message Passing Interface concept. An MPI is a software that connects processes running across multiple computers and allows them to communicate as they run. This is what allows a single script to run a job spread across multiple cluster nodes.

We’re going to install OpenMPI the easy way, as we did with R. While it is possible to install it using the “better” way (spoiler alert: compile from source), it’s more difficult to get it to play nicely with SLURM.

We want it to play nicely because SLURM will auto-configure the environment when a job is running so that OpenMPI has access to all the resources SLURM has allocated the job. This saves us a lot of headache and setup for each job.

1.1 — Install OpenMPI

To install OpenMPI, SSH into the head node of the cluster, and use srun to install OpenMPI on each of the nodes:

(Obviously, replace --nodes=3 with however many nodes are in your cluster.)

1.2 — Test it out!

Believe it or not, that’s all it took to get OpenMPI up and running on our cluster. Now, we’re going to create a very basic hello-world program to test it out.

1.2.1 — Create a program.
We’re going to create a C program that creates an MPI cluster with the resources SLURM allocates to our job. Then, it’s going to call a simple print command on each process.

Create the file /clusterfs/hello_mpi.c with the following contents:

Here, we include the mpi.h library provided by OpenMPI. Then, in the main function, we initialize the MPI cluster, get the number of the node that the current process will be running on, print a message, and close the MPI cluster.

1.2.2 — Compile the program.
We need to compile our C program to run it on the cluster. However, unlike with a normal C program, we won’t just use gcc like you might expect. Instead, OpenMPI provides a compiler that will automatically link the MPI libraries.

Because we need to use the compiler provided by OpenMPI, we’re going to grab a shell instance from one of the nodes:

The a.out file is the compiled program that will be run by the cluster.

1.2.3 — Create a submission script.
Now, we will create the submission script that runs our program on the cluster. Create the file /clusterfs/sub_mpi.sh:

1.2.4 — Run the job.
Run the job by submitting it to SLURM and requesting a couple of nodes and processes:

This tells SLURM to get 3 nodes and 2 cores on each of those nodes. If we have everything working properly, this should create an MPI cluster with 6 nodes. Assuming this works, we should see some output in our slurm-XXX.out file:

Part 2: Installing Python (the “better” way)

Okay, so for a while now, I’ve been alluding to a “better” way to install cluster software. Let’s talk about that. Up until now, when we’ve installed software on the cluster, we’ve essentially did it individually on each node. While this works, it quickly becomes inefficient. Instead of duplicating effort trying to make sure the same software versions and environment is available on every single node, wouldn’t it be great if we could install software centrally for all nodes?

Well, luckily a new feature in the modern Linux operating system allows us to do just that: compile from source! (/s) Rather than install software through the individual package managers of each node, we can compile it from source and configure it to be installed to a directory in the shared storage. Because the architecture of our nodes is identical, they can all run the software from shared storage.

This is useful because it means that we only have to maintain a single installation of a piece of software and its configuration. On the downside, compiling from source is a lot slower than installing pre-built packages. It’s also more difficult to update. Trade-offs.

In this section, we’re going to install Python3 from source and use it across our different nodes.

2.0 — Prerequisites

In order for the Python build to complete successfully, we need to make sure that we have the libraries it requires installed on one of the nodes. We’ll only install these on one node and we’ll make sure to only build Python on that node:

Hooo boy. That’s a fair number of dependencies. While you can technically build Python itself without running this step, we want to be able to access Pip and a number of other extra tools provided with Python. These tools will only compile if their dependencies are available.

Note that these dependencies don’t need to be present to use our new Python install, just to compile it.

2.1 — Download Python

Let’s grab a copy of the Python source files so we can build them. We’re going to create a build directory in shared storage and extract the files there. You can find links to the latest version of Python here, but I’ll be installing 3.7. Note that we want the “Gzipped source tarball” file:

At this point, we should have the Python source extracted to the directory /clusterfs/build/Python-3.7.3.

2.2 — Configure Python

For those of you who have installed software from source before, what follows is pretty much a standard configure;make;make install, but we’re going to change the prefix directory.

The first step in building Python is configuring the build to our environment. This is done with the ./configure command. Running this by itself will configure Python to install to the default directory. However, we don’t want this, so we’re going to pass it a custom flag. This will tell Python to install to a folder on the shared storage. Buckle up, because this may take a while:

2.3 — Build Python

Now that we’ve configured Python to our environment, we need to actually compile the binaries and get them ready to run. We will do this with the make command. However, because Python is a fairly large program, and the RPi isn’t exactly the biggest workhorse in the world, it will take a little while to compile.

So, rather than leave a terminal open the whole time Python compiles, we’re going to use our shiny new scheduler! We can submit a job that will compile it and we can just wait for the job to finish. To do this, create a submission script in the Python source folder:

This script will request 4cores on node1 and will run the make command on those cores. Make is the software tool that will compile Python for us. Now, just submit the job from the login node:

Now, we just wait for the job to finish running. It took about an hour for me on an RPi 3B+. You can view its progress using the squeue command, and by looking in the SLURM output file:

2.4 — Install Python

Lastly, we will install Python to the /clusterfs/usr directory we created. This will also take a while, though not as long as compiling. We can use the scheduler for this task. Create a submission script in the source directory:

However, we don’t want just any old program to be able to modify or delete the Python install files. So, just like with any normal program, we’re going to install Python as root so it cannot be modified by normal users. To do this, we’ll submit the install job as a root user:

Again, you can monitor the status of the job. When it completes, we should have a functional Python install!

2.5 — Test it out.

We should now be able to use our Python install from any of the nodes. As a basic first test, we can run a command on all of the nodes:

We should also have access to pip:

The exact same Python installation should now be accessible from all the nodes. This is useful because, if you want to use some library for a job, you can install it once on this install, and all the nodes can make use of it. It’s cleaner to maintain.

Part 3: A Python MPI Hello-World

Finally, to test out our new OpenMPI and Python installations, we’re going to throw together a quick Python job that uses OpenMPI. To interface with OpenMPI in Python, we’re going to be using a fantastic library called mpi4py.

For our demo, we’re going to use one of the demo programs in the mpi4py repo. We’re going to calculate the value of pi (the number) in parallel.

3.0 — Prerequisites

Before we can write our script, we need to install a few libraries. Namely, we will install the mpi4py library, and numpy. NumPy is a package that contains many useful structures and operations used for scientific computing in Python. We can install these libraries through pip, using a batch job. Create the file /clusterfs/calc-pi/sub_install_pip.sh:

Then, submit the job. We have to do this as root because it will be modifying our Python install:

Now, we just wait for the job to complete. When it does, we should be able to use the mpi4py and numpy libraries:

3.1 — Create the Python program.

As mentioned above, we’re going to use one of the demo programs provided in the mpi4py repo. However, because we’ll be running it through the scheduler, we need to modify it to not require any user input. Create the file /clusterfs/calc-pi/calculate.py:

This program will split the work of computing our approximation of pi out to however many processes we provide it. Then, it will print the computed value of pi, as well as the error from the stored value of pi.

3.2 — Create and submit the job.

We can run our job using the scheduler. We will request some number of cores from the cluster, and SLURM will pre-configure the MPI environment with those cores. Then, we just run our Python program using OpenMPI. Let’s create the submission file /clusterfs/calc-pi/sub_calc_pi.sh:

Here, we use the --ntasks flag. Where the --ntasks-per-node flag requests some number of cores for each node, the --ntasks flag requests a specific number of cores total. Because we are using MPI, we can have cores across machines. Therefore, we can just request the number of cores that we want. In this case, we ask for 6 cores.

To run the actual program, we use mpiexec and tell it we have 6 cores. We tell OpenMPI to execute our Python program using the version of Python we installed.

Note that you can adjust the number of cores to be higher/lower as you want. Just make sure you change the mpiexec -n ## flag to match.

Finally, we can run the job:

3.3 — Success!

The calculation should only take a couple seconds on the cluster. When the job completes (remember — you can monitor it with squeue), we should see some output in the slurm-####.out file:

You can tweak the program to calculate a more accurate value of pi by increasing the number of intervals on which the calculation is run. Do this by modifying the calculate.py file:

For example, here’s the calculation run on 500 intervals:

Conclusion

We now have a basically complete cluster. We can run jobs using the SLURM scheduler; we discussed how to install software the lazy way and the better way; we installed OpenMPI; and we ran some example programs that use it.

Hopefully, your cluster is functional enough that you can add software and components to it to suit your projects. In the fourth and final installment of this series, we’ll discuss a few maintenance niceties that are more related to managing installed software and users than the actual functionality of the cluster.

Happy Computing!

Garrett Mills

Hi, there. I’m a software developer and speaker who likes to make things: https://garrettmills.dev/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store