LJ Archive CD

Unison, Having It Both Ways

Adrian Klaver

Issue #208, August 2011

Unison is a program for bidirectional synchronization of files using the rsync algorithm.

Unison is a file synchronization tool that supports bidirectional updates of files and directories. It uses the rsync algorithm to limit the size of updates to only what has changed. The program runs on UNIX-based systems as well as Windows machines and can sync between them. It uses SSH as the default transport method.

Unison originated as a research project at the University of Pennsylvania, and although that project has moved on (Harmony/Boomerang), the program continues on. This has led to some misconceptions as to the status of the program that are answered at the URL provided in the Resources section of this article. The short version is that the program is very much alive and in active use by many. In this article, I demonstrate the setup and basic usage of Unison.

The first step is installing Unison on your machine. It is available in package repositories, although probably as an older version. For instance, on my version of Kubuntu, 10.04, the package version is 2.27.57. The latest stable version is 2.40.61, at the time of this writing. A lot of usability, performance and cross-platform improvements have been made between those versions, so for this article, I use 2.40.61. For Windows and Mac machines, current binaries are available from the Web site's download page. For other platforms, it is necessary to build from source.

Complete instructions are available in the manual's Install section (see Resources). Unison can be built as either a text or GUI (GTK) version, assuming the appropriate libraries are available. The GUI version also can be run in text mode via a command-line switch, so it is the most flexible. The text-only version is handy for servers where graphical libraries are not installed. Note: when I built from source, the Quit button did not show up in the GUI toolbar. As you will see later, that is more annoying than fatal.

Unison can be used and customized in a variety of ways, but trying to cover all the bases would deplete the magazine's ink allowance for this issue. Instead, I demonstrate the way I use Unison on a daily basis as an illustration of what is possible. My basic setup consists of Unison on three machines: my home desktop, my laptop and an Amazon EC2 instance.

Before going into my actual setup, bear with me as I explain the basic operating principles behind Unison. The starting point for Unison are two roots. These are just directory paths that will be synced. Both paths can be local, or one can be local and the other remote. The preferred method of connecting local to remote is SSH, and sections in the manual and wiki (Resources) describe how to set up Windows to use SSH.

Unison also has a socket method where an instance of the program is set up as a server listening on a socket. The catch is that the data is transferred unencrypted, so use at your own risk. To keep track of state and configuration information, Unison creates a private directory. In the absence of an ENVIRONMENT variable saying otherwise, that directory is $HOME/.unison on UNIX-like systems. On Windows systems, it's either $USERPROFILE\.unison or $HOME\.unison or c:\.unison, depending on the settings for USERPROFILE and HOME.

Now, let's move on to the actual setup: desktop <--> EC2 server <--> laptop. The important part of the above is that each <--> represents a different Unison root pair. The desktop Unison program has no knowledge of what is on the laptop, only what it can see on the EC2 server and vice versa. The role of the EC2 Unison instance in this scenario is to keep a particular set of files in sync with either the desktop or laptop, depending on who is asking. In simplest terms, the EC2 Unison is a server to the desktop/laptop Unison clients, although this is not strictly true due to the bidirectional nature of Unison. Either Unison instance in a pairing can serve or receive files. Connection between the desktop/laptop and the EC2 server is done over SSH using public/private key authentication. For the purposes of this article, I am using a subset of the paths I normally keep in sync.

Although it is possible to run Unison by supplying arguments to the program on the command line, there is a better way—that is, to create a profile file for each root pair that is synced. These are *.prf files stored in the ~/.unison directory. When you start Unison in text mode, you can supply it a some_name argument that maps to a some_name.prf file. In GUI mode, Unison searches the ~/.unison directory for *.prf files and presents them as choices (Figure 1). For this article, I am using lj_article.prf (Listing 1).

Figure 1. Profile Choices

I use this same profile on both the desktop and laptop machines. Let's go through the lines and detail what they mean. An aside, what I talk about here are called preferences in the Unison manual. The first two labeled root are the root pair I'll be syncing. The paths can be either absolute or relative to the directory in which Unison is started. The first root is an absolute one defining my home directory. The second takes a little more explaining. The ssh:// indicates that SSH is used to make the connection to the remote machine. In this case, I have used an SSH alias “alabama” to refer to the remote EC2 host. This expands as aklaver@host_name and assumes the default SSH port of 22. Should your machine be running at a different port, the form is ssh://user@host_name:port_number. The /lj_article portion is the directory path relative to my home directory on alabama. If I wanted to specify an absolute directory path, it would look something like ssh://alabama//home/aklaver/lj_article (note the two forward slashes).

With just the two roots specified, running Unison would sync everything in my local home directory with the lj_article directory on the remote machine. This is not what I want, so I have included some path preferences. A path preference is exactly that—a path to some part of the root that you want to sync. Presence of a path preference restricts the sync to the paths specified. A path is relative to the root of the replica, and the separator character is the forward slash (it will be converted as needed). The important part is that paths are relative to the root. This means that the data I sync on either my desktop or laptop is relative to /home/aklaver and becomes relative to /home/aklaver/lj_article on the EC2 server. A path preference can point to a single file, a directory or a symbolic link (on Unixen). The path preference is inclusive, in that if you specify a directory, it syncs everything in that directory and below.

To filter what is synced in a particular path, or in general, ignore preferences are used. The first form used here, Name 'name' is one of four types; the others being Path 'path', Regex 'regex' and BelowPath 'path' (this is new to 2.40.16). There is a lot of power here, explained fully in the Path Specification and Ignoring Paths sections of the manual. The ignore = Name *~ preference uses globbing patterns to ignore any tilde files in the paths. The second ignore preference ignore=BelowPath .unison/* is more specific. It causes anything in .unison and paths below it to be ignored. This is in contrast to the regular Path 'path' form that matches only the path specified. I am doing this to avoid syncing the archive files and the backup directory (more on that later).

Now I have a quandary. I have said that I want to sync the .unison path with path=.unison and ignore it with ignore = BelowPath .unison/*. At that point, nothing would be synced, and the preferences would be useless. That is where the ignorenot preference comes in. By specifying Name *.prf, I am saying “disregard what I said about ignoring what is in .unison for the specific case of *.prf files”. The end result is that I sync only the profile files in ~/.unison.

The next four preferences configure the backup option. The presence of a backup= preference signals that backups are to be done. A backup is done when a file is changed or deleted by Unison. The backed-up file is kept on the machine where the change is applied by Unison. So if you make a change to a file on your local root, when it is synced to the remote root and the change is applied to the remote file, the previous version of the remote file will be saved on the remote machine. In this case, using Name * means back up everything. It is possible to use the path specifications mentioned above to restrict that. There also is a backupnot that operates like ignorenot.

The backuploc preference specifies where the backups are to be stored. The options are local and central, where local has the backup files being stored alongside the original files, and central moves the backup files to a location as defined in backupdir. With backuploc=central and no backupdir preference, the backups will be found in the directory backup/ in ~/.unison. This is why I have the ignore=BelowPath .unison/* preference in the profile. Although in this case, I am using ~/unison_backup to store backups, I have other profiles using ~/.unison/backup.

The maxbackups preference is self-explanatory; it restricts the number of backed-up files to the three most-recent versions. In its absence, the default is two versions. I use the central method of backup, because I don't want to sync the backups. Per my article, “Using rdiff-backup and rdiffWeb to Back Up and Restore” (LJ, December 2010), I use rdiff-backup to keep versioned backups already. I do like to keep Unison backups close at hand though, as insurance if I make an inappropriate change to a profile and cause a file or files to disappear, or in case I make the wrong decision on which direction to propagate a change. Besides, I am a firm believer that you cannot have too many backups.

The copythreshold preference is one of the performance enhancements in recent versions. Since 2.30.4, it has been possible to tune Unison when doing whole file transfers or when redoing an interrupted transfer. The rsync code, as used by Unison, is designed more for doing changes to files than for moving over entire files. Setting a copythreshold to a positive integer causes Unison to farm out the job of copying complete files. If set to 0, all whole file transfers will be farmed out. Otherwise, a number greater than 0 refers to a file size in kilobytes above which Unison will use another program. By default, that program is rsync, although that can be changed using copyprog. Unison also uses rsync to resume interrupted large file transfers.

In the same vein, Unison 2.30.4+ will, without any setting needed, keep track of an interrupted transfer of a new directory. Whatever has been transferred will be stored in a temporary directory away from the intended destination, on the receiving side, and on the next transfer, it resumes with the untransferred files.

At this point, running this profile is anti-climatic. The first run pops up a warning about there not being archive files present for the root pair (Figure 2). The archive files are where Unison stores its information about the roots and files synced. Pressing Enter starts the process of the initial scan and the population of the archive files. After the scan is done, the GUI presents the choices available for file transfers (Figure 3). In this case, because the remote path is empty, all arrows point from local to alabama. There are two options here: arrow down through each choice and press Enter, or type the letter g as a shortcut for the Go button and start the transfer of all the files at once. On subsequent transfers, the choices for each directory/file may well be different, depending on the changes made on each root (Figure 4). The choice presented is not set in stone, and you can change it with the arrow and Skip buttons on the toolbar, dealing with each file as necessary.

Figure 2. First Sync of Root Pair, Archive Warning

Figure 3. Syncing First Time

Figure 4. A Sync with Changes to Both Roots

For batch actions on the files, use the Actions menu (Figure 5). Also worth mentioning is the Ignore menu item. It is very handy in that it will write an ignore preference to your profile matching the item you select using the ignore type you select. With version 2.40.1+, a profile editor is built in to the GUI interface. So, if you decide you want to undo the ignore preference, simply go to the Synchronization menu and then Change Profile.

As mentioned previously, when I built the GUI, it did not have the Quit button. Typing q as a shortcut still works, or you can go through the Synchronization menu to get to Quit.

Figure 5. Action Menu Showing Bulk Action Choices

So, what happens if a file has been changed on both roots since the last sync? The default action is to skip syncing that particular file, and this is represented by a ? in the file listing (Figure 6). It is left up to the user to decide what to do in that situation. That said, preferences can be set to force Unison's hand (search for “force” in the manual). Should you decide to sync, some tools are available to help you make a decision on what to do with a file. For nonbinary files, you can use the Diff button to look at the diff between a pair of files to help figure things out. There also is the Merge button. This requires some extensive preparation, and I have not used it since I first tried Unison many years ago (at the time there were known issues with the merge code). Since then, considerable work has gone into making merge functional, and it is on my to-do list to give the new code a spin. The manual provides detailed instructions for setting up merge, if you are so inclined.

Figure 6. Sync with Changes to Both Roots, Showing Diff Also

With all these files flying around, how safe is your data? For a more complete discussion, see the Invariants section of the manual. The essence of that section is that Unison guarantees certain behavior for the paths synced as well as its own information about the process, at any moment in time. For paths, that is that they are either at their original state or their completed changed state. The same principle applies to Unison's own private data—it is either in its original form or reflects the current successfully completed syncs. My own experience is that Unison is very forgiving of operator error. A sync interrupted either intentionally or by mistake is recoverable.

I use Unison on a daily basis to help me keep track of file changes on my laptop and desktop. Using an EC2 server as an intermediary host, I am able to bounce the changes from either of my personal machines off the cloud and back to the other machine. For many years now, Unison has been a must-have program for me. I hope you find it useful too.

Adrian Klaver lost a term paper once, due to no backups. Ever since, he has had a bit of an obsession with backup utilities. When not worrying about the state of his data, he can be found trying to outsmart his digital camera.

LJ Archive CD