The shell script compiler, shc, obfuscates shell scripts with encryption—but the password is in the encrypted file. Could an intruder recover the original script using objdump?
shc is a popular tool for protecting shell scripts that contain sensitive information such as passwords. Its popularity was driven partly by auditors' concern over passwords in scripts. shc encrypts shell scripts using RC4, makes an executable binary out of the shell script and runs it as a normal shell script. Although the resulting binary contains the encryption password and the encrypted shell script, it is hidden from casual view.
At first, I was intrigued by the shc utility (www.datsi.fi.upm.es/~frosal/sources/shc.html) and considered it as a valuable tool in maintaining security of sensitive shell scripts. However, upon further inspection, I was able to extract the original shell script from the shc-generated executable for version 3.7. Because the encryption key is stored in the binary executable, it is possible for anyone with read access to the executable to recover the original shell script. This article details the process of extracting the original shell executable from the binary generated by shc.
shc is a generic shell script compiler. Fundamentally, shc takes as its input a shell script, converts it to a C program and runs the compiler to compile the C code. The C program contains the original script encrypted by an arbitrary key using RC4 encryption. RC4 is a stream cipher designed in RSA laboratories by Ron Rivest in 1987. This cipher is used widely in commercial applications, including Oracle SQL and SSL. Listing 1 demonstrates running shc.
The two new files, named with the .x and .x.c extensions to the name of the source shell script, are the executable and an intermediate C version. Upon executing pub.sh.x, the original shell source is executed. shc also specifies a relax option, -r. The relax option is used to make the executable portable. Basically, shc uses the contents of the shell interpreter itself, such as /bin/sh, as a key. If the shell binary were to change, for example, due to system patching or by moving the binary to another system, the shc generated binary does not decrypt nor execute.
I inspected the shell executable using strings and found no evidence of the original shell script. I also inspected the intermediate C source code and noted that it stores the shell script in encrypted octal characters, as depicted in Listing 2.
The C source code also includes as arrays the password as well as other encrypted strings. Therefore, anyone with access to the source code easily can decrypt and view the contents of the original shell script. But what about the original shell binary executable generated by shc? Is it possible to extract the original shell script from nothing but the binary executable? The answer to this question is explored in the next section.
I generated and reviewed the C source code for several shell scripts to better understand how the shell source is encrypted and decrypted. Fundamentally, shc uses an implementation of RC4 that was posted to a Usenet newsgroup on September 13, 1994. I set off by first identifying the encryption key and the encryption text. The objdump utility came in handy for this. bjdump, part of GNU binutils, displays information about object files. First, we use objdump to retrieve all static variables, for this is where the encryption key and the encrypted shell text are stored. Listing 3 provides a brief overview of objdump.
The first column of the output in listing 3 specifies the starting addresses in hexadecimal, followed by the stored data in the next four columns. The last column represents the stored data in printable characters. So somewhere in the first four columns of the output is the array of characters that form the encryption key (password) and the encrypted shell script. Comparing the original C source code and Listing 3, you can see that the password most likely begins at address 0x804a540. After comparing other executables, I determined that the first address after the zeroes leading the “Please contact your provider” text usually is the starting address. To retrieve these arrays, such as the one depicted in Listing 2, we also need to look at the disassembled code. We use objdump again here, except this time with the -d option, for disassemble, as shown in Listing 4.
The last two columns represent assembly instructions. The movl instruction is used to move data—movl Source, Dest. The Source and Dest are prefixed with $ when referencing a C constant. The push takes a single operand, the data source, and stores it at the top of stack.
Now that we have the basics of objdump, we can proceed to extract the encryption password and eventually the shell code.
In the intermediate C code produced by shc, about nine arrays are referenced by the variables pswd, shll, inlo, xecc, lsto, chk1, opts, txt and chk2. The pswd variable stores the encryption key, and the txt variable stores the encrypted shell text. shc hides the useful information as smaller arrays within these variables. Thus, obtaining the actual array involves two steps. First, identify the length of the array. Second, identify the starting address of the array.
The objdump output needs to be looked at in detail to obtain the actual array length and the starting address. My first hint here is to look for all addresses that are within the data section (Listing 2) of the disassembled object code. Next, seek out all the push and mov commands in Listing 4. Addresses will be different for different scripts, but when you encrypt a few scripts and read the resulting C code, the patterns become familiar.
The 804a540 address seems to correspond to the pswd variable, the encryption key. The length of the useful portion of the encryption key is represented by 0x128, or 296 in decimal form. Similarly, the next variables, shll and inlo, have useful lengths of 0x8 and 0x3 and starting addresses of 804a672 and 804a68a, respectively. This way, we are able to obtain the starting addresses and lengths of all nine variables. Next, we need to be able to decrypt the original shell script using only the binary as input.
In shc, before the shell script itself is encrypted, many other pieces of information are encrypted. Furthermore, the RC4 implementation maintains state between encrypting and decrypting each individual piece of information. This means that the order in which shc encrypts and decrypts information must be maintained. Failure to do so results in illegible text. To extract the original shell script, we need to perform several decryptions. For this step, I wrote a small program called deshc, using the existing code from one of the intermediate C files. The program reads two files as its input, the binary executable and an input file that specifies the array lengths and addresses. deshc executes the following four steps:
Reads binary executable.
Extracts data section from the disassembled output.
Retrieves individual arrays based on input file.
Decrypts individual arrays in order, so that the RC4 state is maintained.
Based on the objdump output, I have arrived at the following array lengths and addresses for the pub.sh.x executable:
pswd 0x128 0x804a540 shll 0x8 0x804a672 inlo 0x3 0x804a68a xecc 0xf 0x804a68e lsto 0x1 0x804a6a4 chk1 0xf 0x804a6a6 opts 0x1 0x804a6be txt 0x76 0x804a6e0
All of these parameters are used in an input file to deshc, which then decrypts and prints the original shell script.
An approach to extract the shell source code successfully from shc version 3.7 generated binary executable was demonstrated. The pub.sh script was used for illustrative purposes only. I have indeed tested the deshc program on executables that I did not create and without access to the source code or the original shell script.
Francisco García, the author of shc, recently released version 3.8. It uses somewhat different data structures and improves upon the security of the previous version. Nevertheless, I believe that embedding the encryption password within the binary executable is dangerous and prone to extraction as discussed in this article.