Recently the NAS at my parents home broke, and I needed a quick alternative to take off-site backups of a server. Since that NAS was only there for me when I was younger, and I couldn’t visit them to replace it, I opted to upgrade the Pi4 that was still there doing nothing to a ZFS replication device. I prepared an SD card with Ubuntu server for the Raspberry Pi and told my parents to replace the SD card in the Pi. Then I logged in and installed it remotely.

Step 1: Setting up ZFS

First, we’ll need to get the ZFS tools installed:

sudo apt install zfsutils-linux

Then, we make a pool called rpool representing the whole vdev, and we add a dataset named rpool/backup that used lz4 compression. The command below is for a single disk. If you have multiple disks, you may want to make a mirror vdev (like I will probably do after Covid-19). Lookup what configuration will work best for you. There is a more elaborate tutorial on how to set up a ZFS pool on the ubuntu website.

sudo zpool create -f -o ashift=12 -O  acltype=posixacl -O xattr=sa rpool "$DISK_ID"
sudo zfs create rpool/backup
sudo zfs set compression=lz4 rpool/backup

Step 2: Put the Pi’s SSH key on the remote server

We will pull in snapshots over SSH. To this end, we add the SSH key of the root user on the pi to the root user’s authorized keys file on the remote server.

Note that we do not do it the other way around, we want our backup server to reach out to the remote server and pull in snapshots. We do not want the remote server to push snapshots to the pi, because if this server gets compromised it should not have access the backup server (to wipe it).

It is advised to always pull from the backup server, and ensure that you harden the backup server’s security: only accessible over VPN, the bare minimum of running services, ….

Step 3: Install Syncoid

To pull in snapshots from the remote machine, we will use Syncoid. This is a tool bundled with the policy-driven ZFS snapshot management package Sanoid. To install it, use:

sudo apt install sanoid

Syncoid facilitates incremental replication of ZFS datasets.

Example with one dataset

#             ___________ source ___________   _______target ______
sudo syncoid root@remote:rpool/data/mydataset rpool/backup/mydataset
  1. The above snippet creates a snapshot named “syncoid_ubuntu_YYYY-MM-DD-HH:MM:SS” for rpool/data/mydataset on the remote server.
  2. Rolls back rpool/backup/mydataset to the latest common snapshot between the target and the source.
  3. Incrementally receives that “syncoid_ubuntu_YYYY-MM-DD-HH:MM:SS” snapshot and all earlier snapshots to our target rpool/backup/mydataset dataset (the pi4).
  4. Removes older “syncoid_ubuntu_YYYY-MM-DD-HH:MM:SS”-like snapshots on both the source and the target (keeping the latest one).

Working with multiple datasets recursively

We want Syncoid to incrementally receive all the snapshots for all datasets under /rpool/data on the remote machine to /rpool/data on the pi. To do this we use:

sudo syncoid -R --skip-parent --no-rollback root@remote:rpool/data rpool/backup

Let’s go over the flags:

  • -R indicates that we want to recursively visit all datasets under root@remote:rpool/data
  • --skip-parent ensures that we do not receive root@remote:rpool/data itself (I don’t have anything in there)
  • --no-rollback prevents Syncoid from rolling back snapshots on the target machine (the pi).

The --no-rollback flag ensures that Syncoid does not delete snapshots but only adds them. If you mess with the snapshots on your remote machine (e.g. by doing a rollback), ZFS receive will not be able to receive the latest snapshot. Syncoid will continue with the other datasets and then exit with exit code 2. To still get the changes regardless, you can remove this flag, then Syncoid will look for the latest common snapshot, and it will roll back target to that snapshot and then do the receive.

Because this is a backup system, we do not want to roll back in case an adversary rolls back the ZFS filesystem on our server.

Step 4: Automate and monitor

To not forget to run Syncoid, we will use a SystemD timer. As a general rule you should ensure that you monitor your automated backups actively, If you don’t and something silently goes wrong, you will not notice it until it is too late. Have your monitoring service constantly check the freshness of your backups (active). Do not only rely on the backup server to send you mail when something went wrong (passive monitoring).

Create a file /opt/syncoid-pull/syncoid-pull owned by root and only writeable and executable by root that calls syncoid and sends off metrics for monitoring:

#!/usr/bin/env bash
set -xeuo pipefail # failfast and be verbose

syncoid -R --skip-parent --no-rollback --debug root@remote:rpool/data rpool/backup

zfs list -Hpo creation,name,used -t snapshot -r rpool/backup -s creation |\
    sed 's:\t\([^@]*\)@:\t\1\t:' |\
    column -J --table-columns creation,dataset,name,size -s $'\t' --table-name 'snapshots' |\
    ssh  root@remote 'cat > /var/www/status/backups.json'

The last command in this file sends a JSON of snapshots to a file on the remote machine to be monitored. Replace the last line with something that interacts with your monitoring system.

Automate a call to the script above during the off-peak hour with these SystemD unit and timer files:

/etc/systemd/system/syncoid-pull.service:

[Unit]
Description=syncoid pull
Requires=local-fs.target
After=local-fs.target

[Service]
Type=oneshot
ExecStart=/opt/syncoid-pull/syncoid-pull
WorkingDirectory=/opt/syncoid-pull/

/etc/systemd/system/syncoid-pull.timer:

[Unit]
Description=syncoid pull every night

[Timer]
OnCalendar=04:37:41
Persistent=true

[Install]
WantedBy=timers.target
sudo systemctl enable syncoid-pull.timer && sudo systemctl start syncoid-pull.timer

Extra: Delete old snapshots

To preserve some space on the pi, I want to remove snapshots older than 2 months automatically. To do this, I added the equivalent of the following snippet to my syncoid-pull script.

now=$(date +%s)
zfs list -Hpo creation,name -t snapshot -r rpool/backup \
    | grep $'\t''rpool/backup/[^@]*@zfs-auto-snap_' \
    | tac \
    | while read -r creation snapshot; do
        if (( ( $now - $creation ) > 60 * 60 * 24 * 30 * 2 )); then
            zfs destroy $snapshot;
        fi;
      done

It finds all snapshots in rpool/backup and deletes them once they reach teh age of two months. The tac reverses the list which makes it faster. This action should be performed before sending the list of snapshots to monitoring.

Other tips and tricks

  • Attach a M.2 SATA SSD to USB 3.0 External SSD enclosure to your pi.
  • Do regular ZFS scrubs (zpool scrub rpool) to check the health of your pool and send the health status to monitoring.
  • Use logcheck to send mails to you when a service fails.
  • Don’t boot your Pi from the SD card