Concealing digital data with steganography

Secret Messages


Everybody has information that is for their eyes only, but keeping your secrets to yourself can be largely a matter of luck. Unless, that is, you use the art of steganography.

By Kurt Seifried

Bruce Rolff, 123Rf

Traditionally, computer users have had three primary ways of protecting secret or sensitive communications: encrypt the data so an attacker can't read it, rely on secrecy and hope the attacker never sees it, or use steganography. Steganography [1] is the art of concealing data in plain sight. The advantage of steganography over secrecy is that an attacker who looks directly at the secret data still won't see it. This technique can protect your privacy even if you are forced to decrypt data and show it to officials.

In many situations, the mere suspicion of hidden or encrypted data will be enough to cause problems. For instance, some repressive regimes will chuck you in jail for possession of cryptographic software. Western countries, such as the United Kingdom with its RIP (Regulatory Investigative Powers) Act [2], allow police and law enforcement to compel a suspect to reveal encryption keys or face significant penalties (including jail time).

Steganography is also used for digital watermarking. Watermarking places a hidden mark on an image file, so that if it is copied, the creator of the file can easily trace the source.

An additional important note about steganography is that it should always be used in conjunction with good cryptography (e.g., GnuPG [3]); merely relying on a secret when the attacker has access to the hidden data is never a good idea.

How It Works

Digital steganography exploits the fact that certain standard file formats provide more capacity than any specific instance of the format might ever expect to need - and more precision than the human eye could ever detect. Wikipedia offers this bit-level view of steganography at work [1]:

A 24-bit bitmap will have 8 bits representing each of the three color values (red, green, and blue) at each pixel. If we consider just the blue, there will be 28 different values of blue. The difference between 11111111 and 11111110 in the value for blue intensity is likely to be undetectable by the human eye. Therefore, the least significant bit can be used (more or less undetectably) for something else other than color information. If we do it with the green and the red as well, we can get one letter of ASCII text for every three pixels.

Digital Steganography

Because you can't use lemon juice to write secret messages in electronic documents you need another way to hide information. The simplest way to hide information reliably within a file is to find a file format that contains more data than it needs to or allows the data to be modified without significant alterations to the file. Older graphic file formats, for instance, are good candidates for hiding data. (Newer formats use a variety of compression techniques to create smaller files.) Image formats such as BMP are ideal because each pixel has a color value. Assuming an 8-bit color depth, in each pixel, you can easily store 1 bit of data (so a 1,024x1,024 image would allow you to store about a megabyte of data). The actual color would only be affected in a minor way, and the changes would not be noticeable to the naked eye.

Printing Money

If you haven't heard about the yellow dots that color printers place in documents, prepare for a nasty surprise. When color printers and copiers first came out, there was real and significant concern about criminals using them to simply photocopy paper money (which could then be passed in busy places, dark bars, etc.).

You have to remember that this is before most major began currencies putting reflective foil, holograms, and other non-printable security measures into the currency. The majority of color printer manufacturers and photocopy manufactures agreed to implement digital watermarking technology.

When you print a document or photocopy something in color, chances are the printer will add a small pattern of yellow dots. These yellow dots are very small and can really only be seen in a blue light, but the Electronic Frontier Foundation discovered, after looking at numerous samples, that each device basically prints a unique serial number onto each page it prints. This, combined with a manufacturer's records, data from the store of purchase, and warranty data if you registered your printer, means that law enforcement can essentially scan a printed piece of paper and determine the model and serial number of the printer that printed it.

Hiding Data in Storage

One very simple and basic way to hide data is with a hidden volume within an encrypted container. Tools such as TrueCrypt let you stash your sensitive data in a hidden volume that won't be visible, even if the outer encryption layer is decrypted.

The first layer of cryptography serves as a decoy, with the real data hidden within a steganographic container that simply looks like blank space on the TrueCrypt volume. Anyone who obtains the first password can decrypt the first container, but they will not even know that the second container exists.

Installing TrueCrypt is much like setting up any other application, except for a minor licensing issue. (Apparently, because of some ambiguities in the license, most Linux vendors consider TrueCrypt "non-free" and thus do not distribute it.) Either download binary packages for openSUSE, Ubuntu, Windows, and Mac OS X, or get the source code and compile it yourself [4].

Compiling TrueCrypt is reasonably painless: You install wxWidgets and FUSE development libraries, copy some PCKS11 header files from the RSA ftp site to your machine, then run make and (assuming make is successful) run the resulting truecrypt binary. A couple of compiling options are presented, the most important being whether or not to build TrueCrypt with GUI support (strictly speaking, it isn't needed, but the GUI does make things a little nicer for endusers) and whether or not to build TrueCrypt with static wx library support. (TrueCrypt is actually the first program I've encountered in a long time that uses wxWidgets for its GUI.)

Simply download the TrueCrypt source, and then use the following commands to get the RSA libraries and build TrueCrypt (if you run into problems, see the Readme.txt for help):

# yum install fuse-devel wxGTK-devel
# mkdir /tmp/pkcs11_headers
# cd /tmp/pkcs11_headers
# wget ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-20/pkcs11.h
# wget ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-20/pkcs11f.h
# wget ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-11/v2-20/pkcs11t.h
# export PKCS11_INC=/tmp/pkcs11_headers
# tar -zxf TrueCrypt 6.3a Source.tar.gz
# cd truecrypt-6.3a-source
# make

Once the truecrypt binary is installed, you can copy it to a location on your system (which could alert attackers to the fact that you are using TrueCrypt) or to removable media (such as a USB thumb drive or SD card).

Using TrueCrypt is simple; the first choice is whether or not to use a hidden container (Figure 1) and, if so, how to secure your encrypted containers. The second choice will be whether to use keyfile(s) or not (Figure 2). The advantage of using a keyfile is that the attacker must possess the keyfile to decrypt the data. If you can keep the keyfile(s) separate (e.g., on a USB drive you ship ahead of time to your hotel or on a file you can download from online somewhere), you can make it difficult (if not impossible) for an attacker to decrypt your data.

Figure 1: Choosing a volume type with the TrueCrypt creation Wizard.

Figure 2: Configuring a TrueCrypt keyfile.

Because TrueCrypt hidden volumes are hidden within the free space of the outer TrueCrypt volume, you cannot safely create files or even write to files within the outer TrueCrypt volume. This might raise an attacker's suspicion that a hidden container is present if the outer container only has old or stale data that has not been touched in some months. My advice would be to create a new TrueCrypt container periodically with a hidden volume and copy newer files into the outer volume to keep it "fresh."

One last note: TrueCrypt requires sufficient permissions to mount and unmount file systems, so unless you want to give users the root password, you will probably want to add the user and the truecrypt binary to sudo so users can execute it properly.

Hiding Data in Files

Most personal machines have hundreds, if not thousands, of digital photos, video files, and data files. You can use steganographic techniques to store lots of data in your files without being too obvious about it. This strategy is much like camouflage in that you need not provide just a covering, but also something to hide among. The more data you have to hide something within, the more data you can hide without being to obvious.

Red Hat Fedora and Debian both ship with the Steghide [5] steganography tool. Although you can install Steghide with the use of yum or apt-get, if you have to install it from source code, you'll need libmcrypt-devel, mhash-devel, and libjpeg-devel on your system first. Once the dependencies are addressed, you simply need to compile Steghide with the standard:

# tar -zxf steghide-0.5.1.tar.gz
# cd steghide-0.5.1
# ./configure
# make
# make install

The steghide binary is also quite easy to use. First you need to choose an encryption algorithm. Steghide supports about a half dozen, but the one you probably want is rijndael-256 (which has won the Advanced Encryption Standard (AES) competition and has become the industry standard for encryption). To list the encryption algorithms and strengths, simply run steghide --encinfo.

Second, you're going to need a supported file that can store the data. Steghide can hide data in JPEG, BMP, WAV, and AU files. You might have noticed that all these file types are relatively old and simple, and unlike modern files, they are not terribly sophisticated. (BMP is a perfect example of an unsophisticated file format; the color of each pixel is stored without any compression or fancy data storage techniques.) The advantage of these files (from a steganographic point of view) is that you can easily modify the file and leave it in a valid state. Also, these files leave a lot of space to hide data. Most modern file types use a variety of data-specific compression techniques to make the files smaller.

The following command stores a file within a JPG:

# steghide --embed -ef secret-file.txt -cf innocent.jpg -p password123 -e rijndael-256 -sf output.jpg

The output reports the following:

steghide: the cover file is too short to embed the data.

Argh! The file (innocent.jpg) is too small to hold the secret file, so I'll stash the data in a larger file such as a WAV file:

# steghide --embed -ef secret-file.txt -cf big-innocent.wav -p password123 -e rijndael-256 -sf output.wav

which leads to a better conclusion:

embedding " secret-file.txt" in "big-innocent.wav" ... done
writing stego file "output.wav"... done

If you need to hide really large secret files, you need really large cover files. My advice on this would be WAV files. Use of the program is pretty self-explanatory. To extract data from a file, simply use the --extract option:

Finally, if you're like me (and getting kind of old), you probably can't remember every single option for every single program; luckily, a GUI for Steghide, called SteGUI [6] (Figure 3), comes to the rescue.

# steghide --extract -sf output.jpg
Enter passphrase:
Wrote extracted data to secret-file.txt

Figure 3: Embedding a data file with the SteGUI interface.

Attacking and Defeating Steganography

If you look very closely at an image file that has been modified through steganography, it is sometimes possible to detect imperfections. For instance, if you inspect a BMP file at high magnification, you might find that pixels that are near each other have color values that don't seem quite right. An image with a blue gradient (from light blue to a darker blue) might have pixels that are close but the "wrong" color (such as the one outlined in black in Figure 4), which could indicate that the color values have been modified by a program like Steghide.

Figure 4: A color gradient with one pixel that isn't quite right (outlined in black).

More advanced forms of detection with statistical methods, such as linear discriminant analysis [7], are available in programs such as Stegdetect[8], which analyzes files to detect hidden data. Most likely, you will have to crank up the sensitivity, increasing CPU time, but it will increase the chance of finding hidden data, especially when small amounts are hidden in large files.

Sadly, I couldn't get Stegdetect to compile under Linux, so I cheated and downloaded an older version that came as a prebuilt Windows binary. A general problem with most of this steganographic software is that it hasn't been updated since 2004 or so.

If you turn the sensitivity up on a program like Stegdetect (Figure 5), you will most likely detect random data or simple junk. To illustrate this, a small software company once fooled the CIA into believing they could detect steganographically hidden data. It was unclear whether they actually found anything significant; however, the data was vague enough that there was no real way to disprove it either (i.e., if you look at clouds long enough, you will see a kangaroo) [9].

Figure 5: Stegdetect (on Windows) identifying a file with hidden content.

Another option for foiling steganography is to modify the data in such a way that any steganographically hidden data is sufficiently damaged that it can't be retrieved. For image, sound, and video files, this is relatively simple; re-encoding the data or resizing it will generally do the trick (however, it should be noted that a variety of steganographic and watermarking techniques are designed to resist this technique).

A perfect example of this strategy is "Upside-Down-Ternet," a technique for using Squid web proxy in transparent mode to run all images from the web through a graphics manipulation program, flipping them upside down or making them fuzzy [10].

Despite recent advances in cryptography, secret communication is still an immature field. Fortunately, open source users have access to some basic steganographic tools. Corporations are already making heavy use of steganography to watermark files and media. Users with sensitive data should think of steganography as a second line of defense beyond the protection offered by encryption.

Leaked Data

Unfortunately, if you access and use your hidden data, there is a good chance that it might end up being copied somewhere in the clear. Swap partitions (data from memory written to the hard drive) or temporary files created by text editors might contain a residual image of the data. Or, if you put your laptop into hibernation, when the entire contents of system memory is dumped to the hard drive, it could end up in places where an attacker can find it.

Disabling swap space and ensuring that your system never hibernates will prevent several avenues of exposure, and preventing your programs from creating temporary or cache copies will definitely reduce the risk.

INFO
[1] Steganography definition: http://en.wikipedia.org/wiki/Steganography
[2] Regulation of Investigatory Powers Act 2000: http://www.opsi.gov.uk/acts/acts2000/ukpga_20000023_en_1
[3] GnuPG: http://www.gnupg.org/
[4] TrueCrypt source code: http://www.truecrypt.org/downloads
[5] Steghide: http://steghide.sourceforge.net/
[6] SteGUI: http://stegui.sourceforge.net/
[7] Linear discriminant analysis: http://en.wikipedia.org/wiki/Linear_discriminant_analysis
[8] Stegdetect: http://www.outguess.org/detection.php
[9] The Man Who Conned The Pentagon: http://www.playboy.com/articles/the-man-who-conned-the-pentagon-dennis-montgomery/index.html?page=1
[10] Upside-Down-Ternet: http://www.ex-parrot.com/pete/upside-down-ternet.html