Bare Metal Recovery, Revisited

Charles Curley

Issue #100, August 2002

Charles describes the additions he made to the scripts for his backup and recovery suite.

Imagine your disk drive has just become a very expensive hockey puck. Imagine you have had a fire, and your computer case now looks like something Salvador Dali would like to paint. Now what?

That's the way I started an article on this subject in the November 2000 issue of Linux Journal. The article described a process for backing up a computer and subsequently restoring it to the bare metal. The article described a suite of scripts that were part of both the backup process and the recovery process. Readers can find the article at www.linuxjournal.com/article/4175.

Since then I have added some scripts to the suite. Most of the new scripts are designed for network backups and take advantage of Secure Shell (SSH). (For more information on SSH, see Mick Bauer's “The 101 Uses of OpenSSH” in the January and February 2001 issues of LJ.) I've also made some changes to the scripts introduced in the original article. The suite of revised scripts is available at my home page (see Resources).

The Flaw

The biggest problem with my November 2000 article and the process it described is that the process required a lot of typing at the beginning of the recovery process. You have to enter partition boundaries and other data into fdisk manually, then check the results against your printout. (Printout!? for Murphy's sake!) Then you manually create the appropriate filesystems for each partition. Then you get to mount them, again manually.

This is a lot of typing. I don't know how many times I did test backups and restores on my test computer while I was writing the article. More than I ever want to do again, that's for sure. It's also error prone. After a while all those numbers start to blur together.

The obvious solution is a script or two. What we need is a script that will restore the partition information to a hard drive, then build the filesystems and mount them so that you can run the first stage restoration.

My first pass at this script is the script make.partitions, which is available in the tarball of scripts on my home page. It has two problems: first, it does not rebuild the partitions, so you still have to run fdisk manually; and second, it has to be created by hand for each computer. Add, delete, reformat or otherwise modify a partition, and you have to edit the script. That's not good enough. The script, which is GPLed, should look somewhat like Listing 1.

Listing 1. make.dev.hda

A Script-Writing Script

The second solution is a lot smarter. Why not automate the process? We use gcc to compile gcc. Heck, you can use gcc to compile Perl. Why not a script that creates the script that make.partitions should be? Why not a script-writing script?

make.fdisk parses the output from fdisk -l and mount -l and creates a new script for restoring a given hard drive.

Using Redirection

The first problem we face is one I mentioned in the original article: fdisk does not export partition information in a manner that allows it to be re-imported later on. While other versions of fdisk do allow exporting, tomsrtbt (the floppy-based distribution I recommend for bare metal restore) comes with fdisk, and I don't want to rebuild the tomsrtbt disk. We can handle this with something all well-behaved Linux programs have: I/O redirection. Given a program, foo and a file of commands for foo called bar, we can feed the commands to foo by redirecting foo's input from the keyboard to bar, like this:

foo < bar

So what we want to be able to do is:

fdisk /dev/x < dev.x

where x is the name of the hard drive to be rebuilt.

make.fdisk creates two files. One is an executable script, called make.dev.x, like Listing 1. The other, dev.x, contains the commands necessary for fdisk to build the partitions. You specify which hard drive you want to build scripts for (and so the filenames) by naming the associated device file as the argument to make.fdisk. For example, on a typical IDE system,

make.fdisk /dev/hda

spits out the make.dev.hda script and the input file for fdisk, dev.hda.

How It Works

As you look at the script make.fdisk shown in Listing 2 [available at ftp.linuxjournal.com/pub/lj/listings/issue100/5484.tgz], keep in mind what happens at what time. Like C source code, some things happen later on, at runtime. Others happen at the time the program is compiled, like evaluation of defines and inclusion of header files.

On examining make.fdisk, the first thing we see is that it is a Perl script. Next is a brief description of what the script does. This is followed by a timestamp and two copyright statements. Then we see the usual announcement that the code is free software and distributed under the General Public License. Next is a detailed description of the problem with fdisk we've already seen—and the solution. It is good coding practice to document a program in this manner; it makes the program almost self-documenting.

Now we get to actual Perl code. The subroutine cut2fmt takes a series of column numbers and calculates a format string for later use with unpack. Right after the subroutine we use it to create a format string to unpack the output from fdisk.

After that is a series of definitions of the columns in fdisk's output. With these, we can index into the array created with unpack by name rather than by column number. This should make the script easier to read and more maintainable.

The directory where the rebuilt hard drive will be mounted is named $target so that the first stage restore can find it. Make sure this agrees with the definition of $target in your copy of the script restore.metadata.

Next, the code massages the device name to produce the filenames where we will send our output. Then we define the path to the directory where we will place the output files.

Disk Labels

Labels are tools that Linux uses to abstract partitions. The problem with using device filenames in fstab is that if you add or remove a hard drive you may affect which device file another partition shows up under. Labels travel with the partition, so that with mounting by label you always get the correct partition. They are a problem for us because tomsrtbt doesn't handle labels.

The next section of code executes mount with a command-line switch to make it show the labels. If there is a label in any given line, we save the label and the device filename in a hash. That way, later on when we make the filesystem in the partition, we can assign the label. Also, we need to mount the partition by a device filename so that we can restore to it. We make a hash mapping from device filename to mountpoint so that later on we can build the mountpoint directories and mount the partitions.

Next is a typical Perl command to spawn a process and put the results into a filehandle, in this case FDISK. It is complete with error checking. Then we open our output file, which will eventually be redirected as input to fdisk.

Now we begin a loop to parse each line of the output from the system call to fdisk. We are interested in any line that has the device in it. If we find one, we massage it a bit, unpack it into the array @_ and further massage the array members.

Disk Partitions

If a partition number is less than five, it is either a primary partition, meaning it can have a filesystem in it, or an extended partition, meaning it can have a number of logical partitions in it. In either case, we write the commands to build the partition to the output file. If it is a Linux swap partition, we have to tell fdisk to change its partition type.

If we see a primary partition that is either FAT (but, for now, not FAT32), Linux or Linux swap, we append the appropriate command to the $format to make the partition a FAT, ext2 filesystem or a swap partition. Later on, we'll use $format to create the output script.

A partition number of five or greater only can be a logical partition, that is, one contained within an extended partition. As far as we are concerned, these are either Linux ext3fs, Linux swap partitions, FAT or anything else. As above, appropriate fdisk commands are sent to the output file and appropriate commands to create filesystems are appended to $format.

We look to see if there is a label for each ext2 partition. If there is, we use a command that will recreate that label on the new partition, otherwise we use the same command without a label.

Bad-Block Checking

You will notice that there are two commands to make each filesystem, with one commented out. The one commented out makes the filesystem with no bad-block checking. If I were installing to a brand-new hard drive, I would consider using this. The other does bad-block checking. I prefer to check for bad blocks when reusing a hard drive. The bad-block check is a simple read-only test, which is reasonable most of the time. You can add a write test, which is much more thorough but takes longer, by adding -w to the command-line options for bad blocks. The write test is destructive, but since you will be building a new filesystem in the partition, you don't care.

At the end of our line-parsing loop, if any partition is marked “bootable” (typically a MS-DOS, Windows or Windows NT partition because LILO ignores the bootable flag), we send the commands to make it bootable to the command file.

The last thing we do for the command file is send a “v”, which will have fdisk verify the newly built partition table. Then we send a “w”, which will cause fdisk to write the partition table to the hard drive and then exit. We then close our two files.

Next, we open the file that will become our script and send an appropriate header to the script, similar to the header for this script. The first thing the script actually will do is use dd to write zeros over the first 1,024 bytes of the hard drive. This will clobber any existing master boot record (MBR) so that we don't have to worry about deleting partitions before creating the new ones.

The next step is to create the command that will partition the hard drive, using the command file we've already created. Then the code walks through the hash of mountpoints, creating a comment line, a command to create the directory and then a command to mount the device filename to the directory.

We have to mount starting at the root partition so that mountpoints are created in the correct partition. For example, suppose /usr/local is on its own partition; we have to mount /usr before we build /usr/local. To ensure that is done, we sort the keys of the hash and process the hash in that order.

The last thing we do is change the mode of the files we've just created. Since paranoids live longer, we disallow anyone but root from even reading the script, and make it executable.

Using the Script

The script make.fdisk should be run as a normal part of preparing for backing up for bare metal recovery. Run it before you run save.metadata so that the output files are saved to the ZIP drive. Better yet, have save.metadata call it, once for each hard drive.

When you are restoring, run make.dev.x for each hard drive you have. Again, this can be automated by including it in restore.metadata.

There are other things you can do with this script. Suppose you want to add a new partition. Use the bare metal backup process to save a hard drive, then edit the dev.x command file to change the partition definitions and restore using the edited file. I successfully added a 30MB Mess-DOS partition to my test computer with this technique.

Improvements

Some improvements that you can tackle if you like include having make.fdisk process several hard drives, all indicated on the command line; adding error checking for the argument(s) to make.fdisk, having it produce one script that builds all the hard drives, extending the FAT filesystem support (for one thing, right now the code ignores FAT32); and extending the code to support other filesystems.

Resources