Make Backups!

Albert Y. C. Lai, trebla [at] vex [dot] net
March 31, 2008

You have probably heard a million reminders to make backups; this is yet another. In addition, I speak as someone who have had a disk failure, and I also show you how I make backups now, so hopefully this gets you started.

It is very fortunate that the first disk failure in my life incurred no loss of data. Uncorrupted files could still be read when the disk was cool; writing to a file would corrupt it, but as long as I refrained from writing anything, files could still be read. And so I was able to save the files I cared about and install a new disk. (In the end I was quite happy to upgrade to a larger disk.) Since then I make periodic backups, as restoring from backups is by far more convenient and less stressful than trying to rescue files.

There are many backup programs to choose from, catering for different OSes and different scales. I only know a handful on Linux. I use duplicity, and I recommend it for individual, single-computer use (as long as you don't need hard links). In addition to its home page, it may also be available in your Linux distribution, just probably not yet installed.

How I use duplicity

There are many features in duplicity I use or not use; there are also fixed directories I back up and fixed places I store backups in. I can't be relied upon to remember them every time, and so I write a shell script for that.

 to continue'

    set -f    # disable globbing
    for e in "${EXCLUDES[@]}"; do
      EXCLUDE_PHRASE+="--exclude $e "
    done
    echo 'Running duplicity to make backup'
    $DUP $EXCLUDE_PHRASE $BACK_ME $REPO
    if [ $? = 0 ]; then
      echo 'Backup done'
      if [[ $REPO =~ ^file: ]]; then
        echo 'Remember to ship it off with ship-usb or ship-nc'
      fi
    else
      echo 'duplicity aborted. Now clean up.'
      $DUP --cleanup $REPO
    fi
    ;;

  ship-usb)
    # ship new backup files (zipped) to USB memory stick

    if [[ ! $REPO =~ ^file://(/.*) ]]; then
      echo 'Repository url not local or not absolute. Not shipping.'
      exit 1
    fi
    REPODIR=${BASH_REMATCH[1]}

    read -p 'Hit  when USB memory stick is plugged'

    FILENAME=`date +%m%d`.zip

    # the following is a critical region
    cd $REPODIR && \
    mkdir /tmp/usbmass && \
    mount -t vfat /dev/sdb1 /tmp/usbmass && \
    find . -type f -mtime -1 -print | zip -0j@ /tmp/usbmass/$FILENAME && \
    umount /tmp/usbmass

    if [ $? = 0 ]; then
      find . -name '*.dt.z' -mtime -1 -print | xargs -r rm
      rmdir /tmp/usbmass
      echo 'Done.'
    else
      echo 'Failed. Investigate, cleanup, try again.'
      exit 1
    fi
    ;;

  ship-nc)
    # ship new backup files (zipped) to raw network connection using nc (netcat)

    if [[ ! $REPO =~ ^file://(/.*) ]]; then
      echo 'Repository url not local or not absolute. Not shipping.'
      exit 1
    fi
    REPODIR=${BASH_REMATCH[1]}

    if [ -z "$2" ]; then
      echo 'backup-script ship-nc host'
      exit 1
    fi
    NCHOST=$2

    FILENAME=`date +%m%d`.zip

    echo 'prepare host '${NCHOST}': nc -l -p 55504 > '$FILENAME
    read -p 'Hit  when host '${NCHOST}' is ready'

    cd $REPODIR && \
    find . -type f -mtime -1 -print | zip -0j@ - | nc -q 0 $NCHOST 55504 && \
    find . -name '*.dt.z' -mtime -1 -print | xargs -r rm
    ;;

  *)
    echo 'backup-script [ backup | ship-usb | ship-nc host | duplicity  ]'
esac
]]>

The beginning sets the sources and the targets, and some other parameters. I don't backup the whole system; I just backup home directories of myself and of guest accounts (thus all of /home, and I need root access), though there I have files jotting down my system tunings. I skip highly transient and pointless files: Firefox cache and some GNOME session files. The place for storing backups is currently another computer in my house via ssh, but sometimes it is another directory on the originating computer (see below for how I copy out backups then).

The features of duplicity I use and not use are as follows. I don't need encryption (it would be GPG encryption) because the backups are held on my own computers again. Filenames generated by duplicity come in two flavours: a long form that spells out dates and times, but contains colons and confuses some Windows programs that expect colons to be just for “C:”; and a short form that uses just letters, numbers, and dots (it nonetheless encodes dates and times). I sometimes move the backups through Windows, and so I choose the short form. The backups are cut into tar.gz chunks of a small size, defaulting to just 5MB — the idea is that if you store your backups at a very remote place, through a very slow connection, then when one day you restore just a file, you don't want to download absolutely everything, you just want to download a 5MB chunk that contains that file. Of course I pump it up to 50MB because I am in a much more luxurious situation. Lastly, when I store backups through ssh, I want to provide the password manually.

As mentioned, I sometimes specify the backups to be stored on the same computer. Then clearly I should copy out the backup files elsewhere afterwards. The procedures for doing that are also coded into this script. They pack the recently created backup files (I use “within 1 day old” as a heuristic) into a zip file, then send it to usb flash memory or another computer via netcat. I do this because the other computer used to have only Windows, and zip file, usb flash, and netcat were the simplest solutions. (It now has Linux too.) After that, I delete the backup files on the originating computer, keeping only the signatures and the manifests — they are the only files needed next time duplicity performs incremental backup.

Once the backup files reach the other computer, I write them to DVD-R. This part is too easy to need scripting.


Go to Blog of Albert Y. C. Lai