In this month's column, Eric moves beyond find to cover duplicating files and directory trees using the versatile cpio command. cpio uses space on tape more efficiently than tar and is an excellent alternative for creating archives on platforms that do not have the GNU utilities available. Read on for a thorough discussion of cpio and its three modes of operation: Pass-through, Create and Extract.
In an earlier article, I suggested using the following command for duplicating files and directory trees:
$ find . -depth | cpio -pdmv dest_dir
Since the focus of that tutorial was the find utility, I didn't discuss cpio in depth.
cpio can create and extract archives on diskette, tape or in files using eight different archive formats, including tar. It can also create an almost perfect duplicate of a directory tree, preserving file ownership, modes, and access times. Since cpio is designed to accept a list of files such as the output of ls or find, it is more suited for comprehensive backup systems than tools like conventional tar; the set of files processed can be easily controlled programmatically.
cpio also has some less obvious advantages. The default cpio format uses the space on tape much more efficiently than the conventional tar format, and it can also skip damaged sections of archives and continue during restore operations, instead of quitting completely.
GNU tar, which will be covered in an upcoming tutorial, addresses many of these issues. However, when creating archives for other platforms that do not have the GNU utilities available, cpio is an excellent alternative.
Two of the reasons why cpio is not used are readily apparent. The list of possible command line switches fills nearly half a typewritten page, and since it does not accept file names or wildcards as arguments, novices can find it pretty intimidating. But cpio can be worth the extra effort.
cpio has three modes of operation. Pass-through mode, which is what I used in the example above for duplication, create mode, which is used for creating archives, and extract mode, which is used for extracting files from archives.
As its name implies, in pass-through mode cpio acts as a conduit for copying lists of files from one destination to another. The ability to do this while creating subdirectories as needed and handling special files makes it a crucial tool for any system administrator to be familiar with.
For example, one common situation on a multi-user system is the need for more drive space for user directories. The administrator will need to perform the following steps: add an additional drive to the system, create one or more file systems, copy the user directories from the old file system to the new one, and then, depending upon the circumstances, change the file system mount points in order to make the transition unobtrusive.
There are three methods available for copying the users' files over to the new disk. One is to use tar to archive the files and extract them to the new area. This requires the time necessary to archive and extract the files.
Another is to use cp's recursive mode to copy them directly. This mode copies only regular files and links. It also follows symbolic links, which can duplicate a lot of files when used carelessly.
Of course, few system administrators know exactly what is in their users' directories. A developer may have special files such as sockets or pipes. Any user may have files with special permissions in order to prevent unwanted access. Administrators do not have time to inspect home directories that carefully, and many users do not want them to anyway.
$ find . -depth | cpio --pass-through \ --preserve-modification-times \ --make-directories --verbose /mnt/export
This command causes find to output the name of every file under the present directory. (The -depth option insures that directory names are output before the names of the files in them.) cpio reads these file names in and copies them to /mnt/export.
The switches passed to cpio are:
--pass-throughOperate in pass-through mode.
--preserve-modification-timesSet the modification times of the new files to that of the old ones.
--make-directoriesCreate directories when necessary. (This option works when restoring archives, also.)
--verboseVerbose mode. This mode will produce output for all files. An alternative is the -dot option which only produces a . for each file processed. (These options work in all modes.)
The command above creates an exact duplicate of the original directory, regardless of the types of files or any special file modes that were set.
If the files are being copied to the same file system, the --link option can be used to hard link files when necessary.
Create mode creates archive files. (This is also referred to as “copy-out” mode.) cpio accepts a list of file names, just as it does in pass-through mode. But instead of creating duplicate files in another area, it creates an archive and sends it to standard output.
Since it is sent to standard output, the archive can be redirected to any device or file such as a tape, diskette, or standard file.
$ find -depth /export/home \ | cpio --create > /dev/fd0
This creates an archive of the /export/home directory tree on the floppy drive at /dev/fd0. Of course, the /export/home area probably won't fit on one floppy, but cpio prompts for another device or file name when each floppy is filled, so it can be replaced, and the user can type the device name again. (Note that find's -depth switch is still recommended to prevent possible problems when the archive is extracted.)
When it comes to creating archives, cpio has many options. One of the most important is the format of the archive.
bin(default) the binary format encodes files in a non-portable method. Therefore, it is not suited for exchanging files between Linux on a PC and Linux on other architectures such as Alpha or Power PC.
odcold (POSIX.1) portable format. This is portable across platforms, but is not suited for file systems with more than 65536 inodes, which means most of today's larger hard disks.
newcnew portable format. This is portable across platforms, and has no inherent limit on number of inodes.
crcnew portable format, with a checksum added.
tarcompatible with tar, but only supports file names up to 100 characters.
ustarnew tar format. Supports up to 255 character file names.
hpbinnon-portable format used by HP/UX.
hpodc“portable” format used by HP/UX. Stores device files differently.
The archive format is specified with the --format switch.
Out of all the formats, the crc format is probably the best, since it is portable and has an extra degree of error checking via the checksum.
A better method for creating an archive would be:
$ find /export/home -depth | cpio --create \ --message="Insert next disk and type /dev/fd0 " \ --format=crc > /dev/fd0
This uses the crc format for the archive and prompts the user with Insert next disk and type /dev/fd0 as each floppy is filled. The --message option, which works in both create and extract mode, replaces the default message.
There are many other options available for the creation of archives, which I will cover later.
Even though GNU tar does have many of the advantages of cpio, the ability to use find to specify the files to be backed up provides much more flexibility than shell wildcards. [You can do this with tar, too, but you have to send the output of find into a file and use that file as an “include file” for tar—ED]
Extract mode (also referred to as “copy-in” mode) extracts files from archives. This mode is inconsistent with the other two, since file names are specified on the command line, instead of via a list on standard input.
$ cpio --extract < /dev/fd0
This command restores all of the files from the archive in /dev/fd0, since no file names were specified. If the archive spans more than one volume, cpio will prompt for each volume the same way it does when archives are created. The --message option can be used to override the default message, as in create mode.
cpio automatically recognizes archive formats during extraction, so it is not necessary to specify them on the command line.
The path passed to cpio by find is stored in the archive. Therefore it is important to pay attention to how find is used.
$ find . -depth | cpio --create > /tmp/archive
This creates an archive that extracts into the present working directory.
$ find /export/home -depth | cpio --create \ > /tmp/archive
This creates an archive that will try to extract to /export/home, regardless of the circumstances. If the -d option is specified the directory is created if it does not already exist. (If /export/home does not exist and -d is omitted, the extraction will fail.)
Anything specified on the command line that is not an option is treated as a filename pattern.
$ cpio --extract "back" < /dev/fd0
This will extract files in the archive that have back in their name. No other files will be restored. Multiple patterns can also be specified.
$ cpio --extract "back" "save" < /dev/fd0
This will extract files with “back” or “save” in their names.
In addition to providing patterns on the command line, they can be provided as lines in a file. The file is specified with the --pattern-file=filename option. This provides a lot of flexibility in restoring files, since the actual path does not have to be known and wildcards are not needed. Frequently restored patterns can be stored in a file.
The --nonmatching option is used to specify files not to extract.
It may help to see the contents of the archive before extracting anything from it.
$ cpio --list < /dev/fd0
The --list option lists the contents of the archive. The option --numeric-uid-gid forces the list to show user and group IDs numerically, instead of trying to resolve the names with the passwd and group files.
Instead of standard input and output the archive can be sent to (or extracted from) a file.
$ find /export/home -depth | cpio --create \ --file=/vol/archive
This option works either for creating or extracting archives. To use a remote tape drive specify the hostname and user name before the filename. (The user must have access to the remote host without a password. This can be done by using the file .rhosts)
$ find /export/home -depth | cpio --create \ --file=eric@bajor:/dev/rmt0
One of the key advantages of creating archives with this option is that disk files (archives not on tape or floppy) created with this option can be appended to with the --append option.
This command will work if eric has no password, (not recommended) or if the host that the command is run on is listed in the .rhosts file in eric's home directory.
When restoring an archive it is sometimes desirable to not alter the file modification times:
$ find /export/home -depth | cpio --extract \ --preserve-modification-times --file /vol/archive
The --preserve-modification-times option works in extract mode in addition to pass through mode.
In addition to preserving modification times, the access times for archived or copy files can be preserved so that the cpio operation does not affect the original files:
$ find . -depth | cpio --pass-through \ --make-directories --preserve-modification-times \ --reset-access-time /vol/copy
This will copy the current directory to /vol/copy while copying the modification times on the old files to the new and also leaving the access times on the original files untouched.
The default action for cpio, when operating in copy-in (extract) or pass-through mode, is to prompt a user for confirmation before writing over existing files, if the existing file is newer. By default, cpio will not replace the existing files. The --unconditional option overrides that behavior:
$ cpio --extract --unconditional "back" "save" \ < /dev/fd0
The --dereference option copies the file pointed to by a symbolic link, instead of the link itself, in archive creation and pass-through mode.
The --rename command will prompt the user to interactively rename each file. This only works in extract mode.
When acting as a system administrator, it is sometimes useful to restore an archive or duplicate a directory and change the user or group id of the target in the process.
$ cpio --extract --owner=eric.staff < /dev/fd0
This will restore the archive on /dev/fd0 and set the owner of all the extracted files to eric and the group to staff. Only root may use this option. If the group is left out, it will not be changed unless the . is included, in which case the group will be set to the user's login group.
Another option related to file ownership is --no-preserve-owner. This is the default behavior for non-root users. Files will belong to the user copying or extracting them, instead of the original user. For root the default is to preserve ownership.
There are also advanced options related to transferring data between big-endian and little-endian architectures and for controlling I/O buffer sizes to optimize performance.
The cpio command may seem cryptic at first glance, but after you use it a few times, it will become an indispensable addition to your Linux toolkit. Especially if you are one of the many users with no tape drive and no commercial backup utility, learning cpio and swapping floppies sure beats the (non-existent) alternative after a disk crashes or you make a mistake with the rm command...