Using rdiff-backup for incremental backups of large compressed archive files?

kilolima asked:

I am researching a remote off-site backup strategy. The major limitation is that the upstream pipe is only ~50 kb/s, so with that in mind the incremental backup capabilities using file differentials of rdiff-backup seemed like the proper tool.

In a test case:

  1. First rdiffbackup was run on a local directory to a different local directory. In the directory was a 10GB pax archive of a Maildir directory.

  2. The 10GB pax archive was replaced with a 12GB pax archive in the source directory (representing a month’s worth of additional emails). rdiff-backup was run again. I expected rdiff-backup to be faster this time because there was only a 2GB difference in file size, however, not only was the runtime longer but in the target directory there were 2 files, the original 10GB pax archive and a 12GB temp file.

Can rdiff-backup incrementally backup compressed archive files? It doesn’t seem to be able to.

Currently the mail server writes pax Maildir backups to an external drive. But instead of using these as the rdiff-backup source, would it be better to let rdiff-backup just backup /home/%user%/Maildir (many, many small files)?

I suppose that if the external drive were to fail it would be better not to cripple the 2nd backup system as well!



edit : couldn’t add a tag for ‘pax’?!

My answer:

You’re better off to not compress the data.

rdiff-backup does analyze files for differences, but if they’re compressed archives, it may not be able to find any differences and thus be forced to store the entire new file again.

Also, you can use ssh -C to compress the ssh connection and save some bandwidth.

Finally, if possible, you should get some more bandwidth; that’s barely better than dialup (or maybe it is dialup?). Backing up 12GB of data would take weeks over dialup, and even the 2GB difference could take days.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.