Sail through moments of anguish and despair brought about by failed disks by backing up your data in multiple locations.
The Linux ecosystem has lots of command line utilities for backing and restoring data. Rsync is one of the most popular ones that’s commonly used for copying and synchronising files and directories. You can use it to easily ferry files locally between drives or remotely between two computers over the network. In fact, you can use rsync to back up web servers and mirror websites with a single command.
What makes rsync so useful is the rsync algorithm, which compares the local and remote files one small block at a time using checksums, and only transfers the blocks that are different. If you’re copying over the network, rsync compresses these tiny blocks on the fly before sending them over the wires which further helps cut down the file transfer time. For such network transfers, rsync is usually clubbed with SSH to encrypt the data transfer for added security.
Rsync is the secret sauce behind several graphical tools such as LuckyBackup, which is covered over the page.
Rsync is available in the official repos of almost every distro. Users of Deb-based distros such as Debian and Ubuntu can install it with sudo apt-get install rsync. Similarly, users of RPM-based distributions such as Fedora can fetch it with sudo yum install rsync.
Let’s use rsync to back up a home directory on to another a mounted disk.
rsync -avhW --no-compress /home/mayank /media/backup/
This command copies the entire content of the
/home/mayank directory including files, subdirectories, links, and other file types. Once the files have been copied, type
ls -l /home/mayank /media/backup/mayank
and you’ll notice that the date and timestamps on both the original and the backed-up files are the same.
Notice that there’s no trailing slash after /home/bodhi. Without that trailing slash, rsync will copy files from that directory to a target directory named bodhi (/media/backup/bodhi). Had we put a trailing slash, rsync would have copied all files from /home/bodhi directly to the backup directory (/media/bodhi/stuff/). Keep this in mind and pay close attention to the trailing slashes when copying to a location with existing data.
Now let’s examine the options. The -a (archive) option preserves all ownership, permissions, and creation times on the copied files. The -h option presents the -v (verbose) output (transfer rate and file sizes) in terms that are easier to comprehend.
The -W option asks rsync to copy whole files and not bring the delta transfers algorithm into play. This helps reduce the load of the machine when making an initial transfer. The —no-compress option also helps ease the load off the processor by asking rsync not to compress the data before sending it out, since we’re copying the files between local drives.
After a few days, you might like to repeat the command without the -W option, such as:
rsync -avh --no-compress /home/mayank /media/backup/
This time around rsync copies only the new files under the /home/bodhi directory to the backup directory along with any changes to the original backed up files. You can schedule and run this command at regular intervals to maintain a backup of the home directory.
Note that while rsync will add any new files in the backup, it will not delete any files from the backup target that you have zapped from the original location unless you specifically ask it to. Many users use rsync to maintain an exact replica of a directory. You can use the —delete option to ask rsync to delete files
in the backup target that were removed from the original location.
You can find loads of interesting rsync-based scripts on the web that you can adapt to your needs
In the real world you would want to store backups on a remote machine, and rsync is adept at ferrying files across the network. For network backups, rsync is usually clubbed with SSH, which ensures that the data is transferred over an encrypted medium.
It goes without saying that you’ll have to install and enable SSH on the remote machine. If you can connect to it with the ssh command you’re good to go. Furthermore, you’ll also have to install rsync on the remote machine as well.
rsync -azvh -e ssh /home/mayank email@example.com:/media/backup
This command does the same backup as before, but this time the files are copied over to a mounted location on a remote machine. The remote machine is specified before the remote directory name separated by a colon. The command also introduces two new options. In addition to the -a (archive) and -v (verbose) options, the -z option asks rsync to compress the data before sending it over the wires. The -e option is used to specify the remote shell, which in this case asks rsync to use the SSH remote shell to transfer data.
Just like before, you can repeat this command again, as is, to back up the files to the remote location, copying over only the differences over a secure channel after compressing them. In a production environment, you’d want to run the command as a cron job to back up files at regular intervals after setting up SSH to allow password-less logins for the user who is going to perform the backup.
You can also add the —delete option to make sure the destination is an exact replica of the original. Since this option will remove any deleted files, it’s best used with the —backup option, which make copies of files in the backed up location that have been deleted or updated in the original location. The —backup option is used together with the —backup-dir option to specify the location of the original files along with a suitable suffix to identify them.
rsync -avzh --delete --backup --backup-dir=backup_`date +%A` /home/mayank /media/backup
Like before, this command will make an exact replica of the /home/mayank directory under the
/media/backup/current-backup directory. But when you run this after the contents of the original /home/mayank directory have changed, the extra options in this command (—backup and —backup-dir) will move the files that have been changed or deleted in the original location under a time-stamped directory on the destination before removing them.
By preserving the original files inside a time-stamped directory, the previous command helps you create a weekly incremental backup. All files modified every day are copied to a directory named after the day of the week, such as /media/backup/backup_Monday. Over a week, seven directories will be created that reflect changes over each of the past seven days.
Other useful options
The rsync command has dozens of options. We’ve already used the most common ones to sync and back up files and folders, in the examples above. Here are some more options that’ll help you use rsync more precisely.
First up are the —include and —exclude options. As you can guess, these can be used to control which files are backed up and which aren’t. For example, the following command will only back up files and directories that start with ‘spec’ and ignore the rest:
rsync -avzh e ssh --include ‘spec*’ --exclude ‘*’ /home/mayank firstname.lastname@example.org:/media/backup
Similarly you can also specify a ceiling size for files to be copied with the —max-size option. Any files beyond this specified size are ignored and aren’t copied. In the following example, rsync will only copy files that are less than 100MB in size:
rsync -avzh --max-size=100m ~/Downloads /media/backup
In the same vein, you can use the —min-size option to ignore files that are smaller than the specified file size. However, please note that both these options are transfer rules only. This means that they only help the receiver limit the files to be transferred, and will have no affect whatsoever on the deletions.
If you are using rsync to ferry a lot of data, the command might dominate the resources and overpower the system and make it unresponsive. To avoid such a situation you can throttle the network I/O bandwidth with the —bwlimit option. For example, the following command limits the maximum transfer rate to 100 KB/s:
rsync -avzh --delete --bwlimit=100 ~/Downloads /media/backup
There’s a lot more you can do with rsync. In this Masterclass we’ve introduced some of the most common use cases and the options that are used to execute them. However, rsync supports a lot more options that are detailed in its man page.
You could roll yourself a pretty good backup script with rsync, ssh, cron and a few other Linux tools. But if that sounds too complicated or time-consuming, you could head to your distro’s package manager and grab LuckyBackup. With LuckyBackup you get all the advantages of rsync with the added convenience of a graphical interface.
When you launch LuckyBackup for the first time, create a new profile. You can then store different backup sets within each profile.
Begin by clicking the Add button, which will open up the Task Properties window. In this window you’ll need to fill out a few details about the backup. In the Name field, enter some text to identify this task from the others, such as “Backup Documents to USB”. Next, point to the directory you wish to back up (such as
~/Documents) and the destination where you want it saved (such as /media/USB).
Remember that you can only add one directory per task. If you need to back up multiple directories, you’ll need to create a different task for each source. It might seem a bit inconvenient at first, but the advantage of creating separate tasks is that you can back up different directories in different ways, to different location and even schedule them to run at different times and intervals.
When adding a task, pay close attention to the Backup Type field. The default backup option performs a full backup and copies the contents of the source directory under the destination directory. Then there’s the Synchronise option, which ensures that the contents of the source and the destination directories are the same.
At the bottom of the interface, there’s a checkbox labelled ‘Do NOT create extra directory’. By default it’s unchecked and asks LuckyBackup to back up files after creating a new directory inside the destination directory with the same name as the source directory. If, however, you just wish to back up the contents of the directory and not the directory itself, then make sure you toggle the checkbox. Next to it is a spin-box using which you can define the maximum number of backup snapshots you want LuckyBackup to preserve. By default the tool will only preserve a single snapshot but you can ask it to store up to 500 snapshots.
When you have created all your backup and sync tasks, you can use LuckyBackup to schedule them. In the Task List window, select the task you wish to schedule and head to Profile > Schedule. In the Schedule window click the Add button to open the scheduler. Here you can set the interval for the execution of the task. Back in the Schedule window, select the just added schedule and click the cronIT! button, which will then create a cron job for the backup task.
luckyBackup is very flexible and lets you create as many tasks as you want that you can group them inside multiple profiles.
One of the greatest strengths of rsync is its ability to perform remote backups and synchronisation. This functionality flows down to LuckyBackup as well. While adding a task, click on the Advanced button to reveal more options. Using this Advanced section you can set up exclusions, configure remote options, customise command options, and a lot more.
If you’re backing up something like your home directory, you might want to exclude preserving locations that house things like temporary files and cache. The Exclude tab has pre-defined options that let you select commonly ignored locations and also lets you define your own. Similarly, switch to the Include tab to specify folders that shouldn’t be excluded from the backup. If you select the Only Include option under this tab, LuckyBackup will only back up the mentioned folders and ignore the rest.
To do a remote backup, you’ll have to use the superuser version of LuckyBackup and then head to the Remote tab. After enabling the checkbox to use a remote host, you’ll first have to specify whether the remote host will act as a destination for the data or the source. The latter option is used when defining restoration tasks. Also make sure that the destination path specified exists in the remote computer. Next, enter the IP address or the hostname of the remote machine and the username you wish to login as.
You will also need to select the SSH checkbox. Then hit the Browse button corresponding to the ‘private key file’ field and point it to the known_hosts files under the hidden .ssh directory. When you run the backup, you’ll be prompted for the password for the remote user. Once you’ve entered everything, use the Validate button to ensure your backup settings are good to go.
When the inevitable happens and you need to restore data from your backup, first make sure that you install LuckyBackup inside the new Linux installation on the restored computer. Next, make sure that the previous destination for the backups is available and accessible.
The first task when you launch LuckyBackup is to import the original backup profiles. These are automatically backed up along with the data. To reinstate them, head to Profile > Import and navigate to the destination directory. The profile is housed in a hidden directory named .luckybackup-snapshots.
Once the profile has been imported, you’ll be able to see all the backup tasks. However, instead of backing up data, you now want to restore it. To do this, head to Task > Manage Backup, which displays a browsable list of all the backup snapshots. Select the snapshot you wish to restore and click on the Restore button. The app will show you a dialog box confirming the location of the backed up data and its original location. By default, LuckyBackup will restore the data to its original location, but also gives you the option to restore the data elsewhere.
That’s all there is to it. The tool does justice to its rsync underpinnings and is loaded with features that are cleverly tucked away so as to not intimidate new users. Play around with the tool and fine-tune it as per your requirements, but make sure you use the Dry Run option while you’re learning to avoid accidentally zapping files.
When viewing backup snapshots, LuckyBackup will also let you view the differences between the source and the selected snapshot.
If you are backing up data to a remote machine, by default, LuckyBackup will prompt you for the password of the remote host before establishing the SSH connection. This works for manual backups, but isn’t really feasible for unattended scheduled backups. If you want to schedule a remote backup you will have to set a secure shell up to do password-less authentication. Be warned though that a password-less SSH login isn’t considered a best practice from a security point of view.
To set it up, first head to the local machine from where the connection to the remote SSH server will be established and the data will be backed up. On this machine, type ssh-keygen -t rsa. This command will generate a pair of public and private keys. Later on you’ll copy over the public keys to the remote machine. For now, make sure you don’t enter a password when generating a key and just hit Enter when prompted.
Once the keys have been generated, copy the public key to the server with the ssh-copy-id -i .ssh/id_rsa.pub username@remotehost command. Make sure you replace username with the user you will log in as on the remote SSH server and replace remotehost with the IP address or hostname of the remote machine.
To test the password-less login, try establishing an SSH connection to the remote SSH server from the local machine. If all goes well, instead of being prompted for a password, you should be allowed inside without being prompted for a password. You can now use LuckyBackup to schedule and run unattended remote backups.
Mayank Sharma has been finding productive new ways to mess about with free software for years now.