LJ Archive

Subversion: Not Just for Code Anymore

William Nagel

Issue #143, March 2006

Here is a subversive way to handle multiple versions of your personal information instead of just versions of code.

Have you ever needed some information from a file, only to remember that you modified the file a week ago and removed the very information you're interested in? Or, have you ever spent hours sifting through dozens of inconsistently named copies of the same file trying to find one particular version? If you're like me, the answer is probably a resounding yes to both questions. Of course, if you're a programmer, you've probably already solved that problem in your development activities by using a version control system like CVS or Subversion. What about everything else though? Mom's cherry pie recipe may not change as frequently as rpc_init.c, but if you do decide to create a low-cal version, you're not going to want to lose the original. As it turns out, version control isn't only for source files anymore. Many of the features of Subversion make it ideal for versioning all kinds of files.

With Subversion, you can keep a history of changes made to your files. That way, you easily can go back and see exactly what a given file contained at a particular point in time. You also save space, because it stores deltas from one version to the next. That way, when you make a change to a versioned file, it needs only enough extra space to store the changes rather than a complete second copy of the file. Also, unlike with CVS, delta storage on Subversion also applies to binary files as well as text files.

Subversion makes it easy to access your files from multiple computers too. Instead of worrying whether the copy of the budget report on your laptop reflects the changes you made last night on your desktop system at home, you can simply run an update on your laptop and Subversion automatically updates your file to the latest version in the repository. Also, because all of the versions are stored in a single repository, there is a single location that you need to back up in order to keep all of your data safe.

What to Version

So your interest is piqued. You're sold on the advantages of versioning your files, and you'd like to give it a try. The first question to answer is what files you're going to put under version control. One obvious possibility would be to version your entire hard drive. In practice though, that's not a very practical approach. When you store a portion of a repository's contents locally (in what's called a working copy), Subversion stores a second copy of each file to allow it to compare locally changes you have made with the last version from the repository. Therefore, if you version the entire hard drive, you'll need twice as much hard drive.

There's also little reason to keep full revision history on the largely static parts of your filesystem, such as /usr or /opt. On the other hand, directories that contain a lot of custom files/modifications, such as /etc or /home, are prime candidates for versioning, because the advantage of tracking those changes is more likely to outweigh the disadvantages of extra storage requirements. Furthermore, with Subversion, you can opt to create a working copy from a subtree in the repository hierarchy. That way, you don't need to store any copies of infrequently accessed data locally, which often results in a net reduction in hard drive requirements, even though the files you are storing locally take up twice as much space.

Getting Subversion Up and Going

Now, let's dive in and get Subversion running on your machine. Installing is generally pretty easy. You can, of course, download the Subversion source and compile that, but in most cases, it's going to be much easier to install the precompiled binary package for your Linux distribution of choice. Fortunately, Subversion has matured to the point where such a package is available for almost every major distribution. In fact, I don't know of any off the top of my head that it isn't available for.

Once you have Subversion installed, it's time to create a repository. Let's say you have a documents directory in your home that you'd like to version. First, you need to create a new empty repository using the svnadmin create command. For instance, the following creates a new repository in your home directory:

$ svnadmin create $HOME/.documents_repository

Next, you need to import your existing documents into the newly created repository. To do that, use the svn import command with the directory to import and a URL that points to the repository. In this example, the URL refers directly to the repository using a file://-type URL. If your repository will be used only locally, the file:// URL is the easiest way to access a repository (there are other, better ways to access repositories that I'll discuss in a bit though):

$ svn import $HOME/documents file://$HOME/.documents_repository

When you run the import command, Subversion opens an editor and asks you for a log message. Whatever message you enter will be associated with the newly created repository revision and can be seen by examining the repository history logs. Enter something brief, such as “imported documents directory”. As soon as you save the log message and leave the editor, Subversion performs the import and outputs something like the following:

Adding         documents/file1.txt
Adding         documents/file2.txt
Adding         documents/file3.jpg

Committed revision 1.

You can now safely remove the original $HOME/documents and then re-create it as a working copy of the repository, using the svn checkout command:

$ rm -rf $HOME/documents
$ svn checkout file://$HOME/.documents_repository $HOME/documents

So far, so good. However, if you want to take advantage of Subversion from multiple machines, you're going to need to set up a server. Several options are available to you, but the best choice is generally to use Apache with mod_dav, which serves a Subversion repository using the WebDAV protocol.

From a basic Apache installation, getting WebDAV to work is fairly simple. First, you need to make sure that mod_dav and mod_dav_svn are being loaded:

LoadModule      dav_module        modules/mod_dav.so
LoadModule      dav_svn_module    modules/mod_dav_svn.so

Next, you need to set up a <Location> directive to point to your repository. For example, if you want your repository to be referenced with the URL http://example.net/bill/documents, and the repository is located in /srv/repositories/bill_documents, you could use the following Location directive:


<Location /bill/documents>
   DAV svn
   SVNPath /srv/repositories/bill_documents
   AuthType None
</Location>

Or, if you want more security, you could allow for valid users only:


<Location /bill/documents>
   DAV svn
   SVNPath /srv/repositories/bill_documents
   AuthType Basic
   AuthName "Bill's Documents"
   AuthUserFile /srv/repositories/bill_documents/passwd
   Require valid-user
</Location>

Basics of Using Subversion

Now that you have Subversion set up, let's take a look at some of the basic commands you'll need to know. The basic command-line Subversion client program is called svn, and all of the client commands that you'll use are accessed through that program. To get a complete list of the commands that are available, you can run svn help. You can also run svn help [command] to get help on a particular command.

The first basic command that you need to know is svn add. When you create a new file in a working copy, Subversion doesn't add it to the repository automatically. That way, you can control what gets versioned. For example, it usually would be a waste of space to add an emacs scratch file to your repository. Using svn add is easy. You simply need to give it the names of any files or directories to be added, and they will be scheduled for addition to the repository, assuming they reside inside of a valid working copy. Note though, that I said “scheduled”. When you issue the svn add command, Subversion doesn't actually add the files to the repository yet. Instead, it schedules them to be added at the next commit. That way, you add multiple files with several svn add commands and still batch them together so that they are committed as a single revision, along with any already-versioned files that have been locally modified.

So what's a commit? Well, when you modify files inside a working copy, the data isn't sent to the repository automatically. You need to commit your changes to the repository with svn commit. The commit command performs the work of actually sending your local changes to the repository to create a new revision. Normally, if you're in the working copy, you can simply issue svn commit with no options, and it will recursively commit all changed files under your current directory. However, if you don't want to commit all of the local changes, you can specify only certain files by listing them on the command line.

Once a file has been added to the repository, it can be freely modified locally and Subversion automatically determines what changes have been made in order to send them to the repository when you perform a commit. There is a restriction though. You can't just copy, move or delete files with the standard cp, mv or rm commands. If you do, Subversion won't know about the change and will lose track of the file. Instead, you need to use the Subversion equivalents svn cp, svn mv and svn rm. Syntactically, they work about the same as the local versions that you're used to, but they also schedule their respective actions to be applied to the repository on the next commit.

To find out the current status of a file in your working copy, you can use svn status. The status command shows you information such as which files are not under version control, which files have been modified and which files are scheduled for addition. For instance, the following output shows two modified files and one that hasn't been added to the repository:

$ svn status
?         .GroceryList.txt.swp
M         Frogs.png
M         GroceryList.txt

You also can use the svn update command to update your working copy to the latest revision. If you're accessing only the Subversion repository from a single computer, updating isn't necessary. However, if files are being modified from multiple computers, you need to run svn update in your working copy to get any changes that have been committed from a different computer.

Autoversioning

Now that I've explained the hard way of using Subversion to keep track of your files, let's take a look at autoversioning. When you use Apache as your Subversion repository server, it uses an extension of the WebDAV protocol for transferring files to and from the repository. An interesting side effect of this is that most operating systems can mount shared WebDAV directories as a network filesystem, much like Samba or NFS. That means you can mount a Subversion repository and directly access the files without needing to store them locally in a working copy. This can have several advantages that make it really nice for dealing with your personal files. For one, a new version is created every time a file is saved. That way, you have a complete save history of your files without worrying about whether you've done a commit recently. You also add files to the repository merely by creating them, and can do copies, moves or deletes with the standard filesystem commands. Furthermore, if you access your repository from multiple computers, you always know that you're accessing the most recent version without remembering to run svn update.

Of course, autoversioning does have its downsides. For one, it requires a reasonably fast network connection to the computer that's serving the repository, so it may not be practical for a laptop that's frequently used away from home; although if you have access to a network connection back to the server, it's always possible to copy files to your local hard drive, edit them and then copy them back to the repository. Another downside to autoversioning is that you can access only the most recent repository revision. If you want to access older revisions of files, you have to download them locally, which can be done either by checking out a directory at a specific revision:

$ svn checkout -r 1563 http://$MY_SERVER/docs/pics/

or by using svn cat to download a single file:

$ svn cat -r 1563 http://$MY_SERVER/docs/pics/beach.jpg

If you're going to use autoversioning to mount your repository under Linux, you need to install davfs. Once it is installed, mounting the repository is easy. All you need to do is run mount.davfs, like in the following example, which mounts the repository $MY_SERVER/docs at /mnt/documents:

$ mount.davfs http://$MY_SERVER/docs/ /mnt/documents

Before you can use autoversioning though, you also have to turn it on in Apache. To do that, you need to add the SVNAutoversioning on option to your <Location> directive for the Subversion repository.

Conclusion

Subversion is a system with a large feature list, many of which I haven't even touched on. You should know enough now to get started with versioning your files though. I've been using it for my files for a while now and find it to be very helpful. I find it especially useful when used with autoversioning, which makes it an almost seamless integration with the filesystem.

Resources for this article: /article/8751.

William Nagel is the Chief Software Engineer for a small technology company, and the author of Subversion Version Control: Using the Subversion Version Control System in Development Projects. He can be reached at bill@williamnagel.net.

LJ Archive