What's needed for a complete backup system?

Brandon Wamboldt asked:

For my new server, I want to setup a proper backup solution. I’ve found a great setup that will do twice-daily incremental backups via Dropbox. I plan on backing up my various databases, the webroot directory, the /etc directory/repository, and /var/log.

What else do I need to know to do a proper backup, and what is the standard setup here to ensure you can quickly restore from a backup in the case of a system failure?

I’m thinking of using Puppet, as it describes how the system should be. My restore procedure would look like this:

  1. Install Puppet
  2. Run my puppet config
  3. Restore my backups from Dropbox (Should I create a script to do this? Probably)

This should also let me create a clone of my production server for use in dev environments, correct? Am I missing anything of importance?

My answer:

We build backup systems for one purpose: To enable restores. Nobody cares about backups; they care about restores.

There are three reasons one might need to restore file(s): Accidental file deletion, hardware failure, or archival/legal reasons. A “complete” backup system would enable you to restore files in all of these scenarios.

For accidental file deletion, things like Dropbox and RAID fail because they simply reflect all changes made to the filesystem, and a deleted file is gone in these scenarios. Your backup system should be able to restore a file to a recent point in time fairly quickly; preferably the restore would complete within seconds to minutes.

For hardware failure, you should use solutions such as RAID and other high-availability approaches when possible to ensure that your service remains up and running, as a full restore of a system can take hours or possibly days due to the necessity of reading and writing to (relatively) slow media.

Finally archives, or full backups (or equivalent) of the systems at a specific point in time, can serve restores in both legal and disaster recovery scenarios. These would typically be stored off-site, in case a stray meteor turns your data center into a smoking crater…

Your complete backup system should be able to support restores for any of these three types, with varying levels of service (SLA). For instance, you may decide that a deleted file may be restored with one business day granularity for the last six months and one month granularity for the last three years; and that a disk failure should be capable of being restored within four hours with no more than two business days of data loss. The backup system must be able to implement the SLA in a backup schedule.

Your backup system must be fully automated. This cannot be stressed enough. If the backups aren’t fully automated, they simply won’t happen. Your backup system must be capable of fully automated backups, out of the box, with little or no special configuration or scripting required.

You must periodically test restores. Any backup system is utterly useless if restoring from backup fails to work. I think most of us have horror stories along these lines. Your backup system must be able to restore single files or whole systems within the SLA you’re implementing.

You must purchase backup media on an ongoing basis. Whether you’re just doing on-site tape backup or going whole hog with off-site cloud backup, make sure you have it in the budget to pay for the gigabytes (or terabytes!) of space you will need.

This has been a very brief summary of a portion of Chapter 26 of The Practice of System and Network Administration, Second Edition, which anyone who is or aspires to be a system administrator should own, read, and memorize.

I’ve glossed over a lot of things that don’t necessarily apply to your particular situation or that don’t make sense in a small environment such as the one you’ve described. Nevertheless it should be a reasonable description of the features that your “complete” backup system should have, as well as why they’re necessary.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.