The future of Linux updates

Upgrade 2.0


Constant security updates can give you peace of mind, but the inconvenience of download - install - reboot can be a pain. We show you how to avoid downtime while staying up to date.

By Kurt Seifried

Few things are more annoying than testing something and having a software package change versions unexpectedly, so I have automatic updates disabled on my Fedora 11 test machine. Recently, however, when I ran yum update, I received a bit of a shock: Although I had updated the machine recently, I still had another 250MB to download and install (Figure 1). Thank goodness for broadband - if I were still on dial-up or on a slower form of broadband, I would have been up the creek.

Figure 1: A confirmation screen showing total download size for updates.

Smaller Is Better

Who cares if I need to download 100MB every week or so and occasionally reboot to keep my system up to date? If you can make updates small enough that they don't significantly affect the amount of bandwidth being used (and for wireless systems, this means extending battery life), then it's much more likely that you can automate updates and keep everyone secure. Furthermore, with embedded devices becoming more complicated (running Java, web servers, email servers, and so forth), there is a greater chance that everything needs security updates.

For package files such as RPM and DPKG, one simple solution is simply to leave out any files in the update that haven't changed. For example, on Fedora 11 the package openoffice.org-core is 92MB in size, but between the original release and the first update of this file there are about 30.3MB of identical files (images, XML files, and so on) in addition to the overhead of including these files within the RPM (path information, hash values of the files, etc.). But even this isn't much of a gain.

How can you get the update really small? Binary diff tools such as bsdiff [1] can be used to compare two executable files and create a diff file that allows you to upgrade the old binary with a relatively small diff file. For example, the httpd binary as shipped with the Apache package in Fedora 11 is a 322KB executable; however, the bsdiff for the most recent update is a mere 8KB.

Unfortunately, bsdiff doesn't always work well - a single changed function call or new function at the beginning of the binary file can result in two binary files that look different enough to cause the patch file to be extremely large (for example, comparing two recent Fedora 11 kernel files that are 1.8MB in size results in a bsdiff patch file that is 1.8MB in size).

A Better Way

Google's new Chrome web browser includes an automatic update functionality, Courgette [2], which will no doubt be rolled into their Google Chrome operating system (a lightweight Linux platform with the Chrome web browser on top). Courgette basically does the same thing bsdiff does, but with slightly more intelligence. Instead of just comparing two raw executables in binary format, Courgette converts the binaries to primitive assembly language and does the diffing at that level. By bringing it up a level, Courgette is able to compare the symbol tables for the executables with a much higher chance of finding matching strings, which results in a smaller patch because it can ignore more of what hasn't changed, even if the location has changed slightly. Google claims a roughly 90 percent decrease in the size of patches when compared with the use of just bsdiff.

Although you can solve the problem of size, that still leaves you with the rather sticky problem of dealing with kernel updates that require a system reboot. Although the kernel might reboot quickly, some services, such as VMware or various databases, can take minutes to come online fully. In the case of a database, it could take hours for the cache to become properly populated.

Other services, such as VOIP servers, might simply be in use 24 hours a day, seven days a week, with no ability to be shut down. Luckily, this problem has led to the creation of Ksplice [3]. Ksplice, which is relatively simple in theory, creates replacement code (what you have when compared with the update), resolves symbols in the replacement code, and then checks the safety of the patch (e.g., determining whether any functions being replaced have not been in-lined elsewhere and need updating there too).

Ksplice then inserts jump (JMP) instructions into the original kernel that point to the new code, and - presto - you have an updated kernel without rebooting. The interesting thing about Ksplice is that it can also be applied to services running in user space, which could eventually result in systems that can be updated without rebooting or even having to restart services. If you want to give Ksplice a spin, it's available for Ubuntu from the package archive on their website [4].

Role of Vendors

On the other hand, why don't vendors ship a full file and a partial file that only contains the update? Part of the problem is the overhead: Do you only keep partial packages for each possible upgrade (i.e., a partial for version 1.0 to 1.1, 1 to 1.1.1, 1.0 to 1.2, and so on), or just for each step of the upgrade (i.e., a partial for version 1.0 to 1.1, 1.1 to 1.1.1, 1.1.1 to 1.2, and so on). The good news is that certain update-related technologies, such as geolocation, are being handled, and I suspect that as tools like Courgette and Ksplice mature, you'll see vendors embracing them.

What to Do

If you only have one machine, there isn't a whole lot you can do to reduce the size of upgrades. If you have two or more machines (running the same software), you have a number of easy tricks to reduce the number of downloads and time spent on updates. Most update software grabs updates via HTTP, which means you can use a web proxy to handle requests and cache the data. Of course, you will have to change the default configuration significantly, allowing for files of up to 100MB (or more) and having a very large cache size of several gigabytes. The advantage of this is that you can install a transparent proxy server, such as Squid [5], and not have to modify the configuration on any systems.

Another effective strategy is to mount your updates directory from a central server. With an RPM-based system, you must be careful and ensure that you only share the directory with the actual packages (i.e., /var/cache/yum/updates/packages/). If you share the /var/cache/yum/updates/ directory, for example, the various systems might get upset about sharing files like filelists.xml.gz.sqlite because such files are not designed for concurrent access with multiple systems. On my main server, I simply have NFS enabled with an /etc/exports containing the following:

/var/cache/yum/base/packages *(rw,no_root_squash)
/var/cache/yum/updates/packages *(rw,no_root_squash)

Now wait a minute: Anyone can mount these directories and write to them as root?

RPM provides end-to-end security in the form of signed packages, so if you have GPG checks enabled (gpgcheck=1) in yum.conf, you will find out quickly if anyone tampers with a package. The advantage of letting everyone write to this central directory is that if a client starts an update and downloads the packages, the packages are then available for all the other machines, including the server.

Now you finally have slipstreamed installs. Rather than installing the operating system and then applying the updates, you can create custom install media with the updates already included. For RPM/Anaconda-based installs, you can accomplish this with the use of the createrepo [6] command to create new files (usually contained in the repodata directory on your install CD or DVD). Simply copy the install .iso image, copy the new packages onto it (and get rid of the old ones because you'll probably need the space), run createrepo, and burn a fresh CD or DVD so you have install media with up-to-date packages.

INFO
[1] Binary diff/patch utility: http://www.daemonology.net/bsdiff/
[2] Courgette: http://dev.chromium.org/developers/design-documents/software-updates-courgette
[3] Ksplice: http://www.ksplice.com/
[4] Ubuntu Ksplice package: http://packages.ubuntu.com/jaunty/ksplice
[5] Squid: http://www.squid-cache.org/
[6] createrepo: http://createrepo.baseurl.org/
THE AUTHOR

Kurt Seifried is an Information Security Consultant specializing in Linux and networks since 1996. He often wonders how it is that technology works on a large scale but often fails on a small scale.