Kubernetes powered backup solution over VPN

After ironing out the last bugs with my home grown containerized, distributed, remote backup solution, I can’t say I would recommend it for the average user but if you are comfortable with some hacking, need a backup solution and have a bunch of computers idling at your disposal, this might be for you…

Let’s start with the actual problem, like reasonable people would do. For obvious reasons I do want to backup up files that can’t be easily recreated. For the same obvious reasons I want them stored offsite, the backups should not be burned or stolen together with the original data.

I have done a few iterations of such backup solutions, all utilizing bash and tar in slightly different ways. When hard drives and internet connections got reasonably affordable and quick I added offsite backup over something SSH tunneled (rsync/scp). The location I am currently backing up to has the ftps server only available via vpn so that is part of my limitations in the solution below (otherwise I would say ftps/sftp/scp would have been sufficient for my use case).

The solution I have used lately, running a simple bash backup script (full monthly backup and daily increments which are tared and then encrypted with 7zip) in a Network Namespace with an openvpn tunnel to the offsite location for ftps transfer in the tunnel, has been working more or less fine but had one drawback – lack of parallelization – and running it on multiple bare metal servers is tedious to set up and maintain.

The 7zip encryption is quite demanding and it would be great to scale out in order to take advantage of the available computing capacity in the LAN. Kubernetes to the rescue…

I have a Kubernetes cluster running on two fairly powerful (as of 2021…) Ryzen servers (12 cores/32 GB + 8 cores/32 GB, with one of them running the master node) plus 6 Raspberry Pi 4B 4 GB in my LAN. (The Ryzen servers are running other “nice” processes so should give up available capacity when needed but at the moment I have actually configured the backup jobs to only run on the RPIs tagged with the “rpi” label to not bother the Ryzen servers.)

With version 1.21 Kubernetes included the workload resource Cronjob which basically does what you can imagine (if you have some basic *nix experience). That is quite handy for a backup task since we want the container to run on schedule and self destroy when finished.

Since I want to transfer my encrypted archive to an offsite location in openvpn (without having the host’s networking being affected by this vpn connection) I have a container establishing the vpn connection and one container doing the actual backup task. Since the cronjob is created in a pod where the containers share the networking etc. the backup job is able to transfer to the offsite location.

What about the initial problem, the lack of parallelization? I did not implement some sophisticated queue solution where some workers create the archives and put on a queue meanwhile other workers listens to the queue and encrypts and yet other workers does the actual transfer. The problem itself is quite simple and I want the solution to be simple enough to actually be maintained and to keep it running everyday for years to come.

My simple solution; one cron job for each bigger chunk (the source control repo, photos, family member’s non-cloud documents, mysql databases, etc.) which are scheduled and run independently on the cluster in parallel. I start them during the night (when the internet connections on both ends are not used much anyway) and the compression and encryption tasks don’t finish at the same time due to different file sizes of the archives so they spread out the openvpn/network usage . The transfer tasks share the same limited capacity (about 30 mbit/s to the offsite location in the openvpn tunnel) but the openvpn server is configured to allow multiple concurrent connections from the same user so it is not an issue.

After this introduction, let’s go through the actual implementation including actual configuration and scripts in order for this blog post to actually be useful for someone who wants to implement something similar.

To start with, I run Ubuntu 20.04 LTS (EOL April 2030 so still many years left…) on both the Ryzen servers and the RPIs. The RPI’s are booting and running on reasonably fast USB 3 flash drives and mounted in one of those RPI cluster cases with fans that you can buy cheaply from Amazon or AliExpress. A 7” monitor, power supply for all nodes and gigabit switch is all attached to the case to form one “cluster unit” with only power and 1 network cable as “physical interface”. (When running Ubuntu 20.04 on RPI4, do consider the advice at https://jamesachambers.com/raspberry-pi-4-ubuntu-20-04-usb-mass-storage-boot-guide/.)

I am running Ubuntu’s Kubernetes distribution, microk8s 1.22.4. There are a lot of fancy add ons but according to my experience it is easy to get it into a state where one has to start over if one adds various add ons. After a few attempts I now keep it as slimmed down as possible, no dashboard for example, and only have the add on “ha-cluster” enabled.

Setting it up is basically as easy as running “microk8s add-node” on the master node, and running the corresponding “microk8s join” command on the joining nodes. After that procedure you might admire your long list with “kubectl get no -o wide –show-labels” for example.

Now, over to the “meat” of the solution. The yaml files… I would recommend storing your declarations of the desired states in your source repo so that you can restore your solution on a new cluster with one simple command if needed.

My structure looks like this (I omit the multiple batchjobs and only show two in the file listing below):

-rw-rw-r-- 1 jonas jonas 2745 nov 25 11:02 backup-config.yaml
–rw-rw-r-- 1 jonas jonas 3719 nov 29 14:33 batchjob-backup-dokument.yaml -rw-rw-r-- 1 jonas jonas 3719 nov 29 14:33 batchjob-backup-mysql.yaml
…-rw-rw-r-- 1 jonas jonas 5408 nov 24 15:06 client.ovpn -rw-rw-r-- 1 jonas jonas 242 aug 23 01:30 route-config.yaml

The “backup-config.yaml” (I have tried to indicate the places to update) which contains the script doing the full or incremental backup (depending on the date) including encryption and transfer:

kind: ConfigMap
metadata:
  name: backup-script
apiVersion: v1
data:
  backup.sh: |-
    #!/bin/bash
    DIR_NAME=$(echo $DIRECTORY_TO_BACKUP | tr "/" "-")
    DIR_NAME_FORMATTED=${DIR_NAME::-1}
    BACKUPNAME="rpicluster${DIR_NAME_FORMATTED}"
    BACKUPDIR=/your/path/to/where/you/store/your/archives
    TIMEDIR=/your/path/to/where/you/store/your/time/stamp/files
    TAR="/bin/tar"
    ARCHIVEFILE=""
    echo "DIRECTORY_TO_BACKUP=$DIRECTORY_TO_BACKUP"
    echo "BACKUPNAME=$BACKUPNAME"
    echo "TIMEDIR=$TIMEDIR"
    echo "ARCHIVEFILE=$ARCHIVEFILE"
    export LANG="en_US.UTF-8"
    PATH=/usr/local/bin:/usr/bin:/bin
    DOW=date +%a # Day of the week e.g. Mon
    DOM=date +%d # Date of the Month e.g. 27
    DM=date +%d%b # Date and Month e.g. 27Sep
    MONTH=$(date -d "$D" '+%m') # Number of month
    NOW=$(date '+%Y-%m-%d')    
# First day in month (exception for photos in order to reduce the file sizes)
    if [[ $DOM = "01" && $DIR_NAME_FORMATTED != "Pictures" ]]; then
      ARCHIVEFILE="$BACKUPNAME-01.tar"
      echo "Full backup, no exclude list"
      NEWER=""
      echo $NOW > $TIMEDIR/$BACKUPNAME-full-date
      echo "Creating tar archive at $NOW for $DIRECTORY_TO_BACKUP"
      /usr/bin/nice $TAR $NEWER -c --exclude='/.opera' --exclude='/.google ' -f $BACKUPDIR/$ARCHIVEFILE $DIRECTORY_TO_BACKUP 
    else
      ARCHIVEFILE="$BACKUPNAME-$DOW.tar"
      echo "Make incremental backup - overwrite last weeks"
      NEWER="--newer $(date '+%Y-%m-01')"
      if [ ! -f $TIMEDIR/$BACKUPNAME-full-date ]; then
        echo "$(date '+%Y-%m-01')" > $TIMEDIR/$BACKUPNAME-full-date
      else
         NEWER="--newer cat $TIMEDIR/$BACKUPNAME-full-date"
      fi
      echo "Creating tar archive at $NOW for $DIRECTORIES later than $NEWER"
      /usr/bin/nice $TAR $NEWER -c --exclude='/.opera' --exclude='/.google ' -f $BACKUPDIR/$ARCHIVEFILE $DIRECTORY_TO_BACKUP 
    fi

echo "Encrypt with 7zip…"
/usr/bin/nice /usr/bin/7z a -t7z -m0=lzma2 -mx=0 -mfb=64 -md=32m -ms=on -mh e=on -mmt -p'put-your-secret-phrase-here' $BACKUPDIR/$ARCHIVEFILE.7z $BACKUPDIR/$ARCHIVE FILE
echo "Remove the unencrypted tar archive"
/bin/rm -f $BACKUPDIR/$ARCHIVEFILE
echo "Transfer with lftp"
FILESIZE=$(stat -c%s "$BACKUPDIR/$ARCHIVEFILE.7z")
echo "date -u: About to transfer $BACKUPDIR/$ARCHIVEFILE.7z ($FILESIZE bytes)" >> $BACKUPDIR/$ARCHIVEFILE.7z.scriptlog
lftp -c "open -e \"set ssl:verify-certificate false;set ssl:check-hostname no;set log:file/xfer $BACKUPDIR/$ARCHIVEFILE.7z.log;set net:timeout 60;set net:max-retries 10;\" -u user,password ftp://address-of-your-ftp-server-via-vpn; put -O your-remote-path-here $BACKUPDIR/$ARCHIVEFILE.7z"
echo "date -u: Finished transfer $BACKUPDIR/$ARCHIVEFILE.7z " >> $BACKUPDIR/$ARCHIVEFILE.7z.scriptlog

Alright, with that basic backup script in place which will be re-used by all cronjobs, let’s take a look at one specific batchjob, batchjob-backup-dokument.yaml, which does the backup of the documents directory (I kept my paths in order to show how the volumes are referred):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-dokument
spec:
  schedule: "30 0 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  startingDeadlineSeconds: 3600
  jobTemplate:
    spec:
       template:
         spec:
          shareProcessNamespace: true
          restartPolicy: OnFailure
          volumes:
           - name: scripts
               configMap:
               name: backup-script
           - name: backuptargetdir
             nfs:
               server: qnap
               path: /USBDisk3
           - name: jonas
             nfs:
               server: qnap
               path: /jonas
           - name: vpn-config
             secret:
               secretName: vpn-config
               items:
                 - key: client.ovpn
                   path: client.ovpn
            - name: vpn-auth
              secret:
                secretName: vpn-auth
                items:
                  - key: auth.txt
                    path: auth.txt
            - name: route-script
              configMap:
                name: route-script
                items:
                  - key: route-override.sh
                    path: route-override.sh
            - name: tmp
              emptyDir: {}
         initContainers:
           - name: vpn-route-init
             image: busybox:1.33
             command: ['/bin/sh', '-c', 'cp /vpn/route-override.sh /tmp/route/route-override.sh; chown root:root /tmp/route/route-override.sh; chmod o+x /tmp/route/route-override.sh;']
             volumeMounts:
               - name: tmp
                   mountPath: /tmp/route
               - name: route-script
                   mountPath: /vpn/route-override.sh
                   subPath: route-override.sh
         containers:
           - name: vpn
             image: dperson/openvpn-client
             command: ["/bin/sh","-c"]
             args: ["openvpn --config 'vpn/client.ovpn' --auth-user-pass 'vpn/auth.txt' --script-security 3 --route-up /tmp/route/route-override.sh;"]
             stdin: true
             tty: true
             securityContext:
               privileged: true
               capabilities:
                  add:
                     - NET_ADMIN
             env:
               - name: TZ
                   value: "Switzerland"
             volumeMounts:
               - name: vpn-config
                   mountPath: /vpn/client.ovpn
                   subPath: client.ovpn
               - name: vpn-auth
                   mountPath: /vpn/auth.txt
                   subPath: auth.txt
               - name: tmp
                   mountPath: /tmp/route
           - name: backup-dokument
              image: debian:stable-slim
              securityContext:
                privileged: true
              env:
                - name: SCRIPT
                    value: backup.sh
                - name: DIRECTORY_TO_BACKUP
                    value: /home/jonas/dokument/
              volumeMounts:
                - mountPath: /opt/scripts/
                    name: scripts
                - mountPath: /home/jonas
                    name: jonas
                - mountPath: /media/backup
                   name: backuptargetdir
                command:
                  - /bin/bash
                  - -c
                  - |
                    apt-get update; apt-get install -y lftp p7zip-full procps
                    bash /opt/scripts/$SCRIPT
                    pkill -f -SIGINT openvpn
                    true
                stdin: true
                tty: true
         dnsConfig:
           nameservers:
             - 8.8.8.8
             - 8.8.4.4
         nodeSelector:
           rpi: "true"

As you might have seen in the cronjob above, it is creating the vpn tunnel as a sidecar container (“vpn”) which gets killed after the backup script is done. The “pkill” step is essential for kubernetes to know that the cronjob has finished. Otherwise it would be left unfinished and next nights job would not start (and SIGINT instead of KILL signal is important since the container will be restarted otherwise). Let’s now take a look at the last piece, the vpn tunnel. (The lack of container communication possibilities is hopefully something that gets addresses in an upcoming, not too distant, release. At least there are ongoing discussions for a few years on that topic.)

The vpn container is simply referring to the ovpn config (if it works for you standalone, it will work in this container) and the vpn credentials. Both are stored as credentials, so put your ovpn client config in a file called client.ovpn and create the secret:

kubectl create secret generic vpn-config --from-file=client.ovpn

Same thing with the credentials (I assume now that you will use username and password), create auth.txt with username and password on separate lines and create the secret:

kubectl create secret generic vpn-auth --from-file=auth.txt

That should be it. To test the job without waiting for 00:30 in the case above, kick it off as an ad-hoc job:

kubectl create job --from=cronjob/backup-dokument name-of-manual-dokument-job

You see which pod got created:

kubectl get po -o wide|grep name-of-manual-dokument-job

This pod was called name-of-manual-dokument-job–1-zpn56 and the container name was backup-dokument so the live log could therefore be checked with:

kubectl logs name-of-manual-dokument-job--1-zpn56 backup-dokument --follow

Alright, that wraps it up. Hope it was useful for something. If not for backups, maybe for other use cases where you need to run something in an openvpn tunnel.

Kubernetes powered backup solution over VPN

One Response to Kubernetes powered backup solution over VPN

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta