Manipulating virtual machine disk images with libguestfs

Inside View


Libguestfs provides a powerful set of tools for looking inside disk images.

By Richard W.M. Jones

Just over a year ago, Red Hat faced a problem with its virtualization management suite. Customers had a growing collection of multi-gigabyte virtual machine disk images, but the existing tools for managing these huge files were unacceptably clumsy.

Users often had to log in to a separate Network-Attached Storage (NAS) device to make new disk images, which made provisioning VMs a complex, multi-step, manual process. Storage management code was added to the libvirt virtualization library, so today management tools can provision automatically, but what about the option of looking into disk images, or even tweaking them? A tool called kpartx offered some capacity for modifying disk images. With a root account, kpartx, trust, and a little bit of luck, you could view and tweak images, but this option came with many caveats. Mounting an untrusted virtual machine in the host could lead to a root exploit. Problems also arose when a host crashed. Kpartx was difficult to integrate with scripts as well.

Personally speaking, I thought the situation was completely unacceptable. A disk image isn't special; it is often just an ordinary file. You don't have to su to root to open a word processor document, and GIMP doesn't need to make device nodes in /dev when you edit an image.

Libguestfs [1] solves these problems. It is a scriptable, secure library for accessing and editing disk images - without the need for root-level access. It's also a collection of useful tools to carry out common tasks, and it comes with an interactive shell. The tools included with libguestfs are admin focused. The library is for programmers, and the shell is for command-line hackers and scripters.

Installing Libguestfs

Fedora, Red Hat Enterprise Linux (RHEL) 5, and CentOS users have the easiest time with installation. For Fedora, you can install the library, shell, and tools with:

# yum install \*guestf\*

For RHEL 5, CentOS, and other RHEL 5 derivatives, you need to install the EPEL repository and then proceed with the preceding command. RHEL 6 will ship with libguestfs. Debian and Ubuntu users can get some parts of libguestfs by following the link at the libguestfs FAQ [2]. Libguestfs is still looking for a maintainer for Ubuntu and other distros.

Libguestfs can use KVM for hardware acceleration. To enable this feature, type chmod 666 /dev/kvm. (This change doesn't persist across reboots, so put this command in /etc/rc.local.)

The libguestfs team has a helpful IRC channel, #libguestfs, on the FreeNode network, as well as a mailing list, which you can subscribe to through the website.

The Tools

With libguestfs installed, you can get an idea of the tools available (Table 1) by opening a shell and typing virt- followed by the Tab key a couple of times.

Each of these tools is fully documented in a manual page (e.g., man virt-df).

The libguestfs set includes two lower level tools: guestfish provides full access to the libguestfs API, which is particularly useful for shell scripting changes and tasks not covered by the high-level sys admin tools, whereas guestmount mounts a disk image on a directory (Figure 1). The libguestfs toolkit also includes an API for programmers, which is accessible from C, C++, Perl, Python, Ruby, OCaml, Java, Haskell, and Mono (C#).

Figure 1: You can use guestmount to mount a guest image on a directory. In this case, guestmount has mounted a Windows filesystem on the host machine, where it is visible in the Nautilus file browser.

As the name suggests, the virt-df tool is the virtual equivalent of the df command. Just running the command on its own shows disk usage for all libvirt-managed virtual machines. (You do need to be root to run this command, unless you have changed the permissions on the virtual machine disks so that non-root can read them.) Sample output for the virt-df command is shown in Listing 1.

Listing 1: Checking Disk Usage
01 # virt-df
02 Filesystem                           1K-blocks       Used  Available  Use%
03 Debian5x32:/dev/debian5x32.home.annexia.org/home
04                                        2027920      35844    1889064    2%
05 Debian5x32:/dev/debian5x32.home.annexia.org/root
06                                         329233      81728     230507   25%
07 Debian5x32:/dev/debian5x32.home.annexia.org/tmp
08                                         170549       5663     156080    4%
09 Debian5x32:/dev/debian5x32.home.annexia.org/usr
10                                        2100444     298988    1694756   15%
11 Debian5x32:/dev/debian5x32.home.annexia.org/var
12                                         955480     202308     704636   22%
13 Debian5x32:/dev/vda1                    233335       9546     211341    5%
14 Windows7x32:/dev/vda1                   102396      24704      77692   25%
15 Windows7x32:/dev/vda2                  8284156    7229712    1054444   88%

You can point virt-df at any old disk image, regardless of whether you have root access (Listing 2).

Listing 2: Pointing virt-df at a Disk Image
01 $ virt-df -h ~/disk.img
02 Filesystem                                Size       Used  Available  Use%
03 /home/rjones/disk.img:/dev/vda1         193.7M      21.6M     162.1M   12%
04 /home/rjones/disk.img:/dev/vg_f12x32/lv_root
05                                           5.2G       2.3G       2.6G   45%

For capacity planning and predicting when your virtual machines are going to need more disk space, virt-df is great, particularly because you can run it from a cron job and get it to produce output in the Comma-Separated Value (CSV) format for direct import into spreadsheets and databases.

What do you do with a VM that is starting to outgrow its original disk allocation? You run the virt-resize utility on it (Listing 3).

Listing 3: virt-resize
01 $ truncate -s 10G ~/enlarged.img
02 $ virt-resize ~/disk.img ~/enlarged.img --expand /dev/sda2
03 Summary of changes:
04 /dev/sda1: partition will be left alone
05 /dev/sda2: partition will be resized from 5.8G to 9.8G
06 /dev/sda2: content will be expanded using the 'pvresize' method

If you have a virtual machine that doesn't boot, you can fix files in the image manually with virt-edit:

# virsh list --all
Id Name              State
----------------------------------
- Debian5x32         shut off
- Windows7x3         shut off

The command

virt-edit Debian5x32 /boot/grub/menu.lst

opens the file in Vi or $EDITOR. Another option for unbootable VMs is to use virt-rescue to get a rescue shell, which works just like a rescue CD. Don't try to use virt-edit or virt-rescue on live VMs. (See the box titled "Warning.")

virt-cat offers some simple options for monitoring VMs, such as looking for suspicious events in the VM's log files. The script in Listing 4 uses virt-cat to look for backdoor root accounts in Linux guests.

Listing 4: Looking for Root Accounts
01 #!/bin/sh -
02 # Get list of guests from libvirt.
03 guests=$(
04   virsh list --all | tail -n+3 | head -n-1 |
05   awk '{print $2}'
06 )
07 for n in $guests; do
08   virt-cat $n /etc/passwd |
09     awk -F: '$1 != "root" && $3 == 0 {
10       print "BACKDOOR ACCOUNT FOUND:", $1
11     }'
12 done
Warning

Don't ever use libguestfs or its tools in read/write mode on a virtual machine image that is running. The result will inevitably be disk corruption. (The tools try to stop you from accessing a running VM, but there are some cases they cannot detect.) Most tools have a --ro (read-only) flag, and using this flag is safe, even on live VMs. The --ro flag is a great way to get live information on the state of your VMs. Some tools don't need read/write access and only open images in read-only mode, so they don't need a special flag: If in doubt, check the documentation.

Guestfish

Although the virt-* tools let you carry out system administration operations, the full power of the libguestfs API is only available through guestfish, the guest filesystem interactive shell. Guestfish exposes nearly 300 commands, and readers will be glad to know I can only cover a handful of the common commands in this article.

You can start guestfish on an existing disk image or, if you prefer, create a new disk image from scratch. The definition of disk image covers raw hard disks, CD ISOs, virtual floppy disks (VFDs), compressed formats like qcow2, SD cards, and even filesystems - guestfish can read them all and can write to most of them. To start the guestfish shell, type guestfish. To create a 100-megabyte sparse disk image from scratch, enter:

><fs> sparse test.img 100M
><fs> run

Then, partition it and create a filesystem

><fs> part-disk /dev/sda mbr
><fs> mkfs ext2 /dev/sda1

and upload this article to the filesystem (Listing 5). The /dev/sda does not refer to the host. In guestfish, it refers to the first attached disk (test.img in this case).

Listing 5: Upload Article to Filesystem
01 ><fs> mount /dev/sda1 /
02 ><fs> upload article.txt /article.txt
03 ><fs> ll /
04 total 20
05 drwxr-xr-x  3 root root  1024 Apr 15 13:54 .
06 dr-xr-xr-x 19 root root   0 Apr 12 22:09 ..
07 -rw-r--r--  1 root root  7028 Apr 15 13:54 article.txt
08 drwx------  2 root root 12288 Apr 15 13:54 lost+found
09 ><fs> sync
10 ><fs> exit
11
12 $ ll test.img
13  -rw-rw-r--. 1 rjones rjones 104857600 Apr 15 13:54 test.img

If this disk image is mounted on a virtual machine, the VM will see an ext2 filesystem in a partition containing the article you uploaded. To extract the contents, use the guestfish cat command:

$ guestfish --ro -a test.img -m /dev/sda1 cat /article.txt

The -a flag adds the disk image, and the -m (mount) flag tells guestfish where to find the filesystem within the disk image.

Also, you can use guestfish to examine your libvirt-managed guests. Although this is a little bit more complex for the authors of libguestfs, it is not for the user.

A disk image is just a disk image, but a running virtual machine mounts filesystems from the disk image according to its own conventions, such as mounting /dev/sda1 on /boot and /dev/vg/lv_var on /var, or /dev/sda2 as C:\ on Windows. How do you know how to mount them? Libguestfs contains a tool called virt-inspector that works out this mapping with the use of a sizable set of rules and heuristics. All the user needs to do is supply the -i (inspector) flag to get guestfish to perform a similar function (Listing 6).

Listing 6: Setting the Inspector Flag
01 # guestfish --ro -i Debian5x32
02
03 ><fs> less /boot/grub/menu.lst
04 # menu.lst - See: grub(8), info grub, update-grub(8)
05 #            grub-install(8), grub-floppy(8),
06 #            grub-md5-crypt, /usr/share/doc/grub
07 #            and /usr/share/doc/grub-legacy-doc/.
08 [...]

To find out how much space (in kilobytes) is used by /var/log, enter du /var/log. To find the root account in the password file, enter grep ^root: /etc/passwd.

To list partitions and logical volumes, use

><fs> list-partitions
/dev/vda1
/dev/vda2
><fs> lvs
/dev/debian5x32.home.annexia.org/home
/dev/debian5x32.home.annexia.org/root
/dev/debian5x32.home.annexia.org/swap_1

and to find out what is on them, use

><fs> vfs-type /dev/debian5x32.home.annexia.org/swap_1
swap><fs> file /dev/debian5x32.home.annexia.org/swap_1
Linux/i386 swap file (new style) 1 (4K pages) size 89087 pages

To copy the /home directories into a local TAR file, enter:

><fs> tgz-out /home /tmp/home.tar.gz

With the Augeas [3] configuration editor, you can parse the APT configuration file of a Debian guest. Note that /files is the prefix used by Augeas for matching configuration files; it has is nothing to do with the libguestfs library:

><fs> aug-init / 0
><fs> aug-get /files/etc/apt/sources.list/1/uri
http://ftp.uk.debian.org/debian/
><fs> aug-get /files/etc/apt/sources.list/1/distribution
lenny

Libguestfs has good support for Windows guests too. Just as with a Linux guest, you can mount and examine a Windows guest:

# guestfish --ro -i Windows7x32
><fs> ls /Windows/System32/drivers | head -5
1394bus.sys
1394ohci.sys
AGP440.sys
AMDAGP.SYS
BrFiltLo.sys

Figure 2 shows how to read or write Registry entries on Windows guests using the virt-win-reg utility.

Figure 2: Accessing the Registry on a Windows guest with the help of virt-win-reg.

Although this is all fun, the real power of guestfish comes with its use in scripts. In Listing 7, I use guestfish to clone VMs from a template, after which, I can tweak the new guest's configuration with a script as follows:

# cd /var/lib/libvirt/images
# /tmp/clone.sh oldguest newguest 192.168.1.1 newguest.example.com
# virt-install --import --file newguest
Listing 7: Cloning VMs
01 #!/bin/sh -
02
03 template="$1"
04 newimage="$2"
05 nameserver="$3"
06 hostname="$4"
07
08 dd if="$template" of="$newimage" bs=1M
09
10 echo > /tmp/network <<EOF
11 NETWORKING=yes
12 HOSTNAME=$hostname
13 EOF
14
15 guestfish -i "$newimage" <<EOF
16 write-file /etc/resolv.conf "nameserver $nameserver" 0
17 upload /tmp/network /etc/sysconfig/network
18 sync
19 EOF
20
21 rm /tmp/network

The Future

The current libguestfs API is comprehensive, mature, and well tested. The team was committed to retain API and ABI (binary) compatibility, so the only thing to do in stable releases is add new calls and commands.

One task ahead is to expand the system administration toolchest, possibly with tools to rapidly provision new guests, shrink guests automatically, provide better Windows support, offer CIM support in virt-inspector, provide expert diagnosis of VM problems, and verify the integrity of installed software.

The project is planning upgrades of virt-p2v and virt-v2v. Deeper integration with other management tools is also on the roadmap. At the moment, you can mount a guest filesystem on the host (using FUSE and the guestmount command). The next step is to integrate with virt-manager, so users can click on a button and have the guest filesystem open up. Other options include integration with security tools, rootkit and virus scanners, and disk monitoring tools.

Willing packagers are needed from the Ubuntu, Gentoo, and Mac OS X communities. (An OS X port is already available, it just needs polishing and releasing.) Be aware that libguestfs is difficult to compile from source and requires dedicated packagers with a lot of spare time.

A couple of the questions everyone asks is: Does libguestfs have a GUI? Could users benefit from a GUI? The answers are: No it doesn't have a GUI, and yes, maybe users would benefit. It's hard to imagine what a GUI would look like that matched the power of guestfish. It would be a very large and complex GUI indeed. However, if someone is willing to step up and start that project, they would get plenty of help and encouragement from the team.

INFO
[1] Libguestfs: http://libguestfs.org/
[2] Libguestfs FAQ: http://libguestfs.org/FAQ.html
[3] Augeas: http://augeas.net/