Verifying Filesystem Integrity with CVS

Michael Rash

Issue #94, February 2002

Even the most conscientious of paranoid penguins needs a vacation, so we have a guest columnist for Mick Bauer this month.

Paranoid Penguin

Verifying Filesystem Integrity with CVS

Even the most conscientious of paranoid penguins needs a vacation sometime, so we have a guest columnist for Mick Bauer this month.

by Michael Rash

The single most effective way to catch host-based intrusions is by detecting changes in the filesystem. Toward this end, Tripwire is the best known piece of software that automates the process of verifying filesystem integrity on a machine over time and alerting the administrator if any unauthorized changes are detected. The types of changes that Tripwire can detect include changes in file permissions and type, inode number, link count, uid and gid, size, access timestamp, modification timestamp, inode change timestamp and one-way hash signature (generated by any of ten different signature functions, including md5 and Snefru). In this article we explore the use of the Concurrent Versions System (CVS), together with some Perl glue, as a filesystem integrity checker in a manner similar to Tripwire. Although Tripwire is the best known filesystem integrity checker, it should be noted that Tripwire recently became a proprietary product to which there are many free and/or open-source alternatives, such as AIDE and FCheck (see Resources).

Tripwire

The basic process of deploying Tripwire on a system can be summed up by the following sequence of steps: 1) install the operating system, and immediately run Tripwire from the console before exposing the machine to any type of network connection; 2) store the resulting Tripwire database of filesystem information on read-only media separate from the system; and 3) periodically run Tripwire against the original database so a comparison can be made between the current filesystem information and the original database in order to look for any anomalies.

CVS

CVS is a tool used primarily by software developers to help organize and provide a version structure to a set of source code. One of its most useful features is the ability to keep track of all changes in a piece of source code (or ordinary text) over time, starting from when the code initially was checked into the CVS repository. When a developer wishes to make changes, the code is checked out of the repository, modifications are made and the resulting code is committed back into the repository. CVS automatically increments the version number and keeps track of the changes. After modifications have been committed to the repository, it is possible to see exactly what was changed in the new version relative to the previous (or any other) version by using the cvs diff command.

Now that we have a basic understanding of the steps used to administer Tripwire for a system, we will show how CVS can be leveraged in much the same way, with one important enhancement: difference tracking for plain-text configuration files. If one of your machines is cracked, and a user is added to the /etc/passwd file, Tripwire can report the fact that any number of file attributes have changed, such as the mtime or one-way hash signature, but it cannot tell you exactly which user was added.

A significant problem for intrusion detection in general is that the number of false positives generated tends to be very high, and host-based intrusion-detection systems (HBIDS) are not exempt to this phenomenon. Depending on the number of systems involved, and hence the resulting complexity, false positives can be generated so frequently that the data generated by HBIDS may become overwhelming for administrators to handle. Thus, in the example of a user being added to the /etc/passwd file, if the HBIDS could report exactly which user was added, it would help to determine whether the addition was part of a legitimate system administration action or something else. This could save hours of time since once it has been determined that a system has been compromised, the only way to guarantee returning the system to its normal state is to re-install the entire operating system from scratch.

Collecting the Data

In keeping with the Tripwire methodology of storing the initial filesystem snapshot on media separate from the system being monitored, we need a way to collect data from remote systems and archive it within a local CVS repository. To accomplish this, we set up a dedicated CVS “collector box” within our network. All filesystem monitoring functions will be executed by a single user from this machine. Monitoring functions include the collecting of raw data, synchronizing the data with the CVS repository and sending alerts if unauthorized changes are found. We will utilize the ssh protocol as the network communication vehicle to collect data from the remote machines. To make things easier we put our HBIDS user's RSA key into root's authorized_keys file on each of the target systems. This allows the HBIDS user to execute commands as root on any target system without having to supply a password via ssh. Now that we have a general idea for the architecture of the collector box, we can begin collecting the data.

We need to collect two classes of data from remote systems: ASCII configuration files and the output of commands. Collecting the output of commands is generally a broader category than collecting files because we probably are not going to want to replicate all remote filesystems in their entirety. Commands that produce important monitoring output include md5sum (generates md5 signatures for files) and find / -type f -perm +6000 (finds all files that have the uid and gid bits set in their permissions). In Listings 1 and 2, we illustrate Perl code that collects data from remote systems and monitors changes to this data over time.

Listing 1. collector.pl

In Listing 1, we have a piece of Perl code that makes use of the Net::SSH::Perl module to collect three sets of data from the host whose address is 192.168.10.5, md5 hash signatures of a few important system binaries, a tar archive of a few OS configuration files and a listing of all files that have the uid and/or gid permission bits set. Lines 7 and 8 define the IP address of the target machine as well as the remote user that collector.pl will log in to. Recall that we have the local user's preshared key on the remote machine, so we will not have to supply a password to log in. Lines 10-13 define a small sample list of system binaries for which md5 hash signatures will be calculated, and similarly, lines 15-18 define a list of files that will be archived locally from the remote system. Lines 20-24 build a small hash to link a local filename to the command that will be executed on the remote system, and the output of each command will be stored locally within this file. Lines 27-33 comprise the meat of the collection code and call the run_command() subroutine (lines 36-49) for each command in the %Cmds hash. Each execution of run_command() will create a new Net::SSH::Perl object that will be used to open an ssh session to the remote host, log in and execute the command that was passed to the subroutine.

Listing 2. cvschecker.pl

In Listing 2, we illustrate a piece of Perl code that is responsible for generating e-mail alerts if any of the files collected by collector.pl change. This is accomplished first by checking the current 192.168.10.5 module out of the CVS repository (line 12), executing collector.pl (line 14) to install fresh copies of the remote command output and files within the local directory structure, and then committing each (possibly modified) file back into the repository (line 20). By checking the return value of the cvs commit command for each file (line 20), we can determine if changes have been made to the file, as cvs automatically increments the file's version number and keeps track of exactly what has changed. If a change is detected in a particular file, cvschecker.pl calculates the previous version number (lines 27-36), executes the cvs diff command against the previous revision to get the changes (lines 39-40) and e-mails the contents of the change (lines 47-48) to the e-mail address defined in line 7.

Detecting Intrusions

Now let's put the collector.pl and cvschecker.pl scripts into action with a couple of intrusion examples. Assume the target system is a Red Hat 6.2 machine; HIBDS data has been collected from this machine before any external network connection was established, and the target has an IP address of 192.168.10.5.

Example 1

Suppose machine 192.168.10.5 is cracked, and the following command is executed as root:

# cp /bin/sh /dev/... && chmod 4755 /dev/...

This will copy the /bin/sh shell to the /dev/ directory as the file “...” and will set the uid bit. Because the file is owned by root, and we made it executable by any user on the system, the attacker only needs to know the path /dev/... to execute any command as root. Obviously, we would like to know if something like this has happened on the 192.168.10.5 system. Now, on the collector box, we execute cvschecker.pl, and the following e-mail is sent to root@localhost, which clearly shows /dev/... as a new suid file:

From: hbids@localhost
Subject: Changed file on 192.168.10.5: suidfiles
To: root@localhost
Date: Sat, 10 Nov 2001 17:35:13 -0500 (EST)
Index: /home/mbr/192.168.10.5/suidfiles
=======================================================
RCS file: /usr/local/hbids_cvs/192.168.10.5/suidfiles,v
retrieving revision 1.3
retrieving revision 1.4
diff -r1.3 -r1.4
4a5
> -rwsr-xr-x 1 root root 512668 Nov 10 18:40 /dev/...

Example 2

Now suppose an attacker is able to execute the following two commands as root:

# echo "eviluser:x:0:0::/:/bin/bash" >> /etc/passwd
# echo "eviluser::11636:0:99999:7:::" >> /etc/shadow

Note that the uid and gid for eviluser are set to 0 and 0 in the /etc/passwd entry, and also that there is no encrypted password string in the /etc/shadow entry. Hence, any user on the system could become root without supplying a password simply by typing su - eviluser. As in the previous example, after running cvschecker.pl, we receive the following e-mails in root's mailbox:

From: hbids@localhost
Subject: Changed file on 192.168.10.5: /etc/passwd
Delivered-To: root@localhost
Date: Sat, 10 Nov 2001 17:43:17 -0500 (EST)
Index: /home/mbr/192.168.10.5/etc/passwd
=======================================================
RCS file: /usr/local/hbids_cvs/192.168.10.5/etc/passwd,v
retrieving revision 1.2
retrieving revision 1.3
diff -r1.2 -r1.3
26a27
> eviluser:x:0:0::/:/bin/bash

and

From: hbids@localhost
Subject: Changed file on 192.168.10.5: /etc/shadow
Delivered-To: root@localhost
Date: Sat, 10 Nov 2001 17:43:18 -0500 (EST)
Index: /home/mbr/192.168.10.5/etc/shadow
=======================================================
RCS file: /usr/local/hbids_cvs/192.168.10.5/etc/shadow,v
retrieving revision 1.2
retrieving revision 1.3
diff -r1.2 -r1.3
26a27
> eviluser::11636:0:99999:7:::

Conclusion

Finding changes in the filesystem can be an effective method for detecting intruders. In this article we have illustrated some simple Perl code that bends CVS into a homegrown, host-based intrusion-detection system. At my current place of employment, USinternetworking, Inc., a large ASP in Annapolis, Maryland, we use a similar (although greatly expanded) custom system called USiOasis to help verify filesystem integrity across several hundred machines in our network infrastructure. The machines are loaded with various operating systems that include Linux, HPUX, Solaris and Windows, and run many different types of server applications. The system includes a MySQL database back end, a rather large CVS repository and a custom web/CGI front end written mostly in Perl. Making use of a CVS repository to perform difference tracking also comes with an important additional benefit: an excellent visualization tool written in Python called ViewCVS. Storing operating system and application configuration files within CVS also aids several areas outside of detecting intrusions, such as troubleshooting network and application-level outages, disaster recovery and tracking system configurations over time.

Resources