The blog for the bleeding-edge news in the server of Research and Development.

On the Topic of Backups

← Blog

Backups are pretty important, but I'm sure that's common sense.

On the previous setup, I was using duplicity as my backup solution, leveraging MEGA.co.nz's free 50GB of storage for data.

It worked pretty well for the time being. And then things broke.

Duplicity supports a large number of storage services. Because each different service handles files differently, duplicity has a unified interface for each backend. In turn, those backends rely on libraries developed by various people.

MEGA support for duplicity was through mega.py. The last update for the library was three years ago, and MEGA at the time didn't have a stable API.

Things started going poorly. I patched it up a little bit with minimal knowledge of Python at the time, but though I managed to get it working (and it's still working for now), upkeep is a pain.

Considering the fresh start on this setup, I figured it was time to switch to something else.

Here's the list of what I deemed necessary:

  • Incremental backups. Full backups are easy, but they eat up space rather quickly. File mirrors are also pretty easy, but it doesn't save against completely wiped files.
  • Compression. A lot of plain content can be easily compressed, so why not?
  • Encryption. Can't be too safe storing stuff on the cloud.
  • The ability to handle them regardless of the server I'm using. While this current VPS uses KVM virtualization, the next one might use OVZ. I can't rely on the availability of userspace filesystems (sshfs, &c.)

After a couple of days trying to figure out how to set btar up for my purpose (the lack of documentation makes it a genuinely huge pain), as it turns out, GNU tar actually supports simple incremental backups. With a compression utility, an encryption utility, and practically any file transfer utility, all my requirements are covered.

With that sorted out, I spent the night after hanging out with friends on 4 July setting up a script to do automatic backups.

Including setting up the backups, exclusions, and testing, I hammered out this basic script after six hours:

(The script is released under the MIT License.)

#!/bin/bash

# backup.sh
# Backs up a list of files / directories to MEGA, with exclusions applied.

# Prerequisites:
# The `megatools` and `ccrypt` packages.  For `megatools`, this script uses `.megarc` for configuration.
# (If you have your own preferred encryption and storage process, replace things as necessary.)
# A list of files in a file ${list_backup} to include, and a list of files in a file ${list_exclusions} to exclude.

export TAR_ENCRYPT_KEY='<your encryption keyword here>'

incremental_prefix='server'

config_dir='/root'

list_exclusions="${config_dir}/backup_exclusions.txt"
list_backup="${config_dir}/backup_list.txt"

# We use the week-of-year to determine the incremental to use
# Basically a full backup occurs on the first backup of every week
incremental_weekyear=`date +%G_%V`
incremental_file="${config_dir}/${incremental_weekyear}.${incremental_prefix}.backup.snar"

incremental_time=`date -u +%s`

incremental_output="${config_dir}/${incremental_time}.${incremental_prefix}.tar.xz.cpt"

mega_path='/Root/your-server-backup'

## For testing out which files will get included in backups.
# tar cvf - -X "${list_exclusions}" -T "${list_backup}" 2>&1 > /dev/null | less

tar cf - -g "${incremental_file}" -X "${list_exclusions}" -T "${list_backup}" | nice -n 19 xz | ccencrypt -E "TAR_ENCRYPT_KEY" > "${incremental_output}"
megaput --path="${mega_path}" "${incremental_output}"
rm "${incremental_output}"

# To extract:
# ccdecrypt
# megaget "${mega_path}/$(basename ${incremental_output})"
# ccdecrypt -E TAR_ENCRYPT_KEY $(basename ${incremental_output})
# untar as necessary

And that's now my backup solution. Of course, it's probably not industrial-grade. I'm not saying you should stick mission-critical backups on MEGA; heck, I'm not saying you should throw out your current backup solution and replace it with tar's incremental backup functionality. There are lots of alternatives out there. This just happens to be what I decided on, and so far, it's worked pretty well.

With the large restructuring of server internals, future migrations should go much more smoothly. The backups are a lot smaller too, since I'm not backing up every little thing in the system.

(Up you go!)