Building a Raspberry Pi Cluster

Part I — The Basics

Step 0: Get The Hardware

Parts list

  • 3x Raspberry Pi 3 Model B — for the compute nodes
  • 1x Raspberry Pi 3 Model B — for the master/login node
  • 4x MicroSD Cards
  • 4x micro-USB power cables
  • 1x 8-port 10/100/1000 network switch
  • 1x 6-port USB power-supply
  • 1x 64GB USB Drive (or NAS, see below)

Step 1: Flash the Raspberry Pis

The first step is to get our Pis up and running. Start by downloading the latest version of Raspbian, the Debian distribution that runs on the Pis. Download the command-line only “lite” version to save space: Download Here.

glmdev@polaris ~> touch ssh

Step 2: Network Setup

We want to make sure that our nodes have IP addresses that never change. This way, we can be sure that they can always talk to each other, which is critical for the cluster jobs.

Find the RASPBERRYPI device’s IP address.
ssh pi@ip.addr.goes.here

Step 3: Set Up the Raspberry Pis

3.1: raspi-config

Now we can start setting up the Pi. First, get some basic config out of the way:

pi@raspberrypi~$ sudo raspi-config

3.2: setting the hostname

sudo hostname node01       # whatever name you chose
sudo nano /etc/hostname # change the hostname here too
sudo nano /etc/hosts # change "raspberrypi" to "node01"

3.3: make sure the system time is right

The SLURM scheduler and the Munge authentication that it uses requires accurate system time. We’ll install the ntpdate package to periodically sync the system time in the background.

sudo apt install ntpdate -y

3.4: reboot

sudo reboot

Step 4: Shared Storage

4.0: Login to the Master Node

We will discuss the master node more later, but one of our nodes will be the controller. Just pick one. :) In my cluster, the master is node01.

ssh pi@<ip addr of node01>

4.1: Connect & Mount Flash Drive

4.1.1: Find the drive identifier.
Plug the flash drive into one of the USB ports on the master node. Then, figure out its dev location by examining the output of lsblk:

glmdev@node01 ~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
mmcblk0 179:0 0 7.4G 0 disk
├─mmcblk0p1 179:1 0 43.8M 0 part /boot
└─mmcblk0p2 179:2 0 7.4G 0 part /
sda 8:16 0 59.2G 0 disk
└─sda1 8:17 0 59.2G 0 part
sudo mkfs.ext4 /dev/sda1
sudo mkdir /clusterfs
sudo chown nobody.nogroup -R /clusterfs
sudo chmod 777 -R /clusterfs
UUID="65077e7a-4bd6-47ea-8014-01e06655cc31"
sudo nano /etc/fstab
UUID=65077e7a-4bd6-47ea-8014-01e06655cc31 /clusterfs ext4 defaults 0 2
sudo chown nobody.nogroup -R /clusterfs
sudo chmod -R 766 /clusterfs

4.2: Export the NFS Share

Now, we need to export the mounted drive as a network file system share so the other nodes can access it. Do this process on the master node.

sudo apt install nfs-kernel-server -y
/clusterfs    <ip addr>(rw,sync,no_root_squash,no_subtree_check)
/clusterfs 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)
sudo exportfs -a

4.3: Mount the NFS Share on the Clients

Now that we’ve got the NFS share exported from the master node, we want to mount it on all of the other nodes so they can access it. Repeat this process for all of the other nodes.

sudo apt install nfs-common -y
sudo mkdir /clusterfs
sudo chown nobody.nogroup /clusterfs
sudo chmod -R 777 /clusterfs
<master node ip>:/clusterfs    /clusterfs    nfs    defaults   0 0

Step 5: Configure the Master Node

5.0: Login to the Master Node

Pick one of your nodes to be the dedicated master, and ssh into it. In my cluster, this is node01.

ssh pi@<ip addr of node01>

5.1: /etc/hosts

To make resolution easier, we’re going to add hostnames of the nodes and their IP addresses to the /etc/hosts file. Edit /etc/hosts and add the following lines:

<ip addr of node02>      node02
<ip addr of node03> node03
<ip addr of node04> node04

5.2: Install the SLURM Controller Packages

sudo apt install slurm-wlm -y

5.3: SLURM Configuration

We’ll use the default SLURM configuration file as a base. Copy it over:

cd /etc/slurm-llnl
cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
gzip -d slurm.conf.simple.gz
mv slurm.conf.simple slurm.conf
SlurmctldHost=node01(<ip addr of node01>)
# e.g.: node01(192.168.1.14)
SelectType=select/cons_res
SelectTypeParameters=CR_Core
ClusterName=glmdev
NodeName=node01 NodeAddr=<ip addr node01> CPUs=4 State=UNKNOWN
NodeName=node02 NodeAddr=<ip addr node02> CPUs=4 State=UNKNOWN
NodeName=node03 NodeAddr=<ip addr node03> CPUs=4 State=UNKNOWN
NodeName=node04 NodeAddr=<ip addr node04> CPUs=4 State=UNKNOWN
PartitionName=mycluster Nodes=node[02-04] Default=YES MaxTime=INFINITE State=UP
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/clusterfs*

5.4: Copy the Configuration Files to Shared Storage

In order for the other nodes to be controlled by SLURM, they need to have the same configuration file, as well as the Munge key file. Copy those to shared storage to make them easier to access, like so:

sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /clusterfs
sudo cp /etc/munge/munge.key /clusterfs

5.5: Enable and Start SLURM Control Services

Munge:

sudo systemctl enable munge
sudo systemctl start munge
sudo systemctl enable slurmd
sudo systemctl start slurmd
sudo systemctl enable slurmctld
sudo systemctl start slurmctld

5.6: Reboot. (optional)

This step is optional, but if you are having problems with Munge authentication, or your nodes can’t communicate with the SLURM controller, try rebooting it.

Step 6: Configure the Compute Nodes

6.1: Install the SLURM Client

sudo apt install slurmd slurm-client -y

6.2: /etc/hosts

Update the /etc/hosts file like we did on the master node. Add all of the nodes and their IP addresses to the /etc/hosts file of each node, excluding that node. Something like this:

<ip addr>    node01
<ip addr> node03
<ip addr> node04

6.3: Copy the Configuration Files

We need to make sure that the configuration on the compute nodes matches the configuration on the master node exactly. So, copy it over from shared storage:

sudo cp /clusterfs/munge.key /etc/munge/munge.key
sudo cp /clusterfs/slurm.conf /etc/slurm-llnl/slurm.conf
sudo cp /clusterfs/cgroup* /etc/slurm-llnl

6.4: Munge!

We will test that the Munge key copied correctly and that the SLURM controller can successfully authenticate with the client nodes.

sudo systemctl enable munge
sudo systemctl start munge
ssh pi@node01 munge -n | unmunge
pi@node02 ~> ssh node01 munge -n | unmunge
pi@node01's password:
STATUS: Success (0)
ENCODE_HOST: node01
ENCODE_TIME: 2018-11-15 15:48:56 -0600 (1542318536)
DECODE_TIME: 2018-11-15 15:48:56 -0600 (1542318536)
TTL: 300
CIPHER: aes128 (4)
MAC: sha1 (3)
ZIP: none (0)
UID: pi
GID: pi
LENGTH: 0

6.5: Start the SLURM Daemon

sudo systemctl enable slurmd
sudo systemctl start slurmd

Step 7: Test SLURM

Now that we’ve configured the SLURM controller and each of the nodes, we can check to make sure that SLURM can see all of the nodes by running sinfo on the master node (a.k.a. “the login node”):

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
mycluster* up infinite 3 idle node[02-04]
srun --nodes=3 hostname
node02
node03
node04

Going Forward

We now have a functional compute cluster using Raspberry Pis! You can now start submitting jobs to SLURM to be run on however many nodes you want. I’ll have a crash-course on SLURM in the future, but for now the University of Kansas Center for Research Methods and Data Analysis has good information on getting started here.

A Word on Troubleshooting

These guides are designed to be followed in a top-down sequential order. If you’re having problems with a command, feel free to leave a comment below with the exact number of the step you are stuck on, and I’ll try to answer if I can.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store