Testing web applications with Google's Skipfish

Hook, Line, and Sinker

Skipfish can help you hunt down potential security problems on your website before intruders find them.

By Konstantinos Agouros

Google's Skipfish is a free tool for testing website security. Google calls it "an active web application security reconnaissance tool." And, according to the project website [1], Skipfish "... prepares an interactive sitemap for the targeted site by carrying out a recursive crawl and dictionary-based probes."

Many penetration tools are written in Perl or some other scripting language, but Skipfish, which is released under the Apache license, is written in pure C for better speed and efficiency. An optimized HTTP stack provides additional performance enhancements.

Although the Skipfish security tool is only available in source code form, the installation is easy. After downloading and unpacking the code, users on Linux, BSD, and Mac OS X can simply type make to build the binary. This assumes you have the OpenSSL and LibIDN libraries and their included files in place; on Debian, the libidn11-dev and libssl-dev packages fulfill these conditions.

How It Works

Skipfish basically uses the same crawler principle as similar tools [2] [3], but it doesn't just follow existing HTML links; instead, it attempts to find hidden paths or unlinked files using sophisticated techniques (Figure 1). Skipfish then evaluates the results and categorizes any security flaws into the High, Medium, or Low risk groups. Besides the warnings generated during the test, the program also collects critical information like the server version, any email addresses it finds, password fields for follow-up tests, and file upload fields.

Figure 1: Skipfish generates a substantial amount of traffic on the network. A quick check of HTTP access with the Wireshark sniffer confirms that the tool tries out many variations on website names, just like a crawler.

The High risk classification includes SQL, XML, or shell injections. Medium classification means Cross-Site Scripting (XSS), Cascading Style Sheet attacks, listable directories, and mime-type issues (see the "Cross-Site Scripting" box). The Low classification covers things like X.509 certificate problems, the ability to integrate content other than XSS into the website, and HTML forms without cross-site request forgery protection.

Skipfish generates a sitemap with all the paths it finds and a summary of the identified file types (Figure 2). The Skipfish tests use a dictionary of file names and file suffixes, which the program brute forces. But it also uses things it finds on its voyage of discovery; that is, links to the homepage passed into it. If the program discovers a new suffix, it learns the suffix and refers to it for future testing. The test manager can set the extent to which Skipfish uses brute force using command-line options and by manipulating the dictionary file.

Figure 2: Skipfish presents the results in the form of a dynamic sitemap in HTML. Other views show the data types found and the risk classes.

When Skipfish identifies an HTML form, it parses the GET or POST parameters and checks to see whether the application's input validation will stand up to the implemented attacks. Command-line options define defaults for certain fields, say, if an embedded login mask is required to set a session cookie that is essential to testing the application. Skipfish will use cookies, but the version I tested, v1.33, will not modify them according to the source code, which contains a comment that the tool might be extended to include this functionality at some time in the future.

Cross-Site Scripting

The cross-site scripting (XSS) test includes only very limited JavaScript functionality, which means Skipfish might not identify potential XSS that relies on sophisticated JavaScript. This can happen if nested JavaScript calls create the web page where the XSS takes effect. Without a full-fledged JavaScript engine, Skipfish does not have the ability to ascertain whether the XSS code passed in actually runs in a browser.

Controlled Guessing

Skipfish manages file names and suffixes in a dictionary. Alternatively, users can set a command-line option or use an empty dictionary, to make sure that Skipfish only follows the links it finds instead of brute-forcing all the combinations of file names and suffixes in the dictionary. The tool tries file names as dictionary names to test for things like http://www.example.com/index/.

For a generic test, you will probably want to use the dictionaries supplied with the program. Directories with easily guessed or well-known names and scripts that give attackers access to web servers are a common problem.

In normal operations, the tool will always add entries to the dictionary file. The file is either skipfish.wl in the current directory or the file that the user specifies with the -W option. The software always adds the file names and suffixes that it finds during operations. Thus, it's a good idea to create a copy of the word list the tool uses. If you investigate a site twice, it makes sense to reuse the word list. Skipfish creates a copy of the original file with a .old suffix. Table 1 looks at the file format.

The readme file in the directory with the default dictionaries can help a test manager choose a suitable list. The default list (default.wl) is typically fine. However, system administrators should be aware that the combination of file names and suffixes in this file will generate around 100,000 requests for each directory you investigate. The more exhaustive list (complete.wl) will generate up to 150,000 requests. Because of Skipfish's fast performance, these results can overload a less powerful server and fill up your logfiles (see the "How Fast Is Skipfish?" box). Experienced penetration testers should look into these questions in advance. If you are testing a production system, make sure the scan doesn't cause so much traffic that it takes the server down.

How Fast Is Skipfish?

According to Google, Skipfish achieves up to 7,000 requests per second on a local network, if the web server responds quickly enough. This performance will drop if you use a broadband connection to test a hosted server.

During testing, Skipfish achieved up to 1,400 requests in a live test on the enterprise network, although the limiting factor was probably the web server being tested. On the same Gigabit segment, Skipfish running on an Athlon 64 achieved an access rate of 4,000 hits per second against a web server with a Core2 Duo CPU and a default Apache installation.

Options Control the Scan

Skipfish has a number of command-line options that test managers can use to set the depth and speed at which the program runs the scan, determines what results appear in the report, and sets the various dictionary management options. If the skipfish.wl file exists in the current working directory, the program only expects two options: -o outputdirectory and a URL that sets the starting point.

Make sure the directory doesn't already exist, because Skipfish will want to create it. After the test, the directory is populated with a report containing all the images needed for rendering in the browser. Also, you can specify multiple URLs, which the program tests one after another. The simplest command line looks like this:

skipfish -o outputdirectory http://test.example.com

During a scan, the tool creates some statistics on the scan progress (Figure 3). The forecast of the number of requests left to perform is not very reliable and can't be used to estimate the remaining scanning time. Every time the application finds a new directory, thanks to either the dictionary file or a link, new test cases are added.

Figure 3: During the scan, Skipfish updates an overview of the performance values (i.e., the data transferred, or the number of simultaneous requests).

Manual Access

If the website requires HTTP authentication, the test manager will need to provide the username and password using the -A Username:Password option. For websites that use a form for authentication, you can pass in the credentials as -T name=value. This option is not just useful for passing in access credentials, but also for filling fields in forms. Unfortunately, you can't restrict this to a specific page on the web server, and the application will pass the values you enter to every single form that includes a field of the same name.

If the application that you are testing uses cookies for authentication, you can add them by specifying -C cookiename=value. A word of caution: If you test this with genuine credentials on a multi-user machine, you should be aware that these details will appear in the process table, where anybody can read them.

Test managers can also set specific headers that Skipfish will send with every request. The -H Name=value option handles this. If you want the test to emulate Internet Explorer or Firefox, because the tested application requires this, you can specify -bi for Internet Explorer and -bf for Firefox. The -N option refuses cookies.

A number of options define the search scope. The -S, -X, and -I options let you pass in a string that tells Skipfish not to investigate pages that do not themselves, or whose URLs do not, contain the specified string. Alternatively, you can choose to investigate only the web addresses that contain the string. It makes sense to exclude URLs to prevent the application from following a logout page that points to itself. Other options allow you to manage the test performance via the maximum number of connections and the exhaustiveness of the report that Skipfish generates.

Considering that a scan will create a data volume of around 700MB on a virginal Apache installation, it always makes sense to test locally if you can, rather than using a WAN. Larger applications will cause far more traffic.

Formatted Results

Skipfish creates a website with the results in the specified output directory. At the start of the page, you will find an overview that tells you how many errors of what gravity and what documents the tool has identified. The embedded JavaScript lets you expand the view and browse the web tree. For each directory, another overview tells you the number of errors found in the subtree.

The sitemap follows a list of identified mime types. Clicking a type reveals a list of the URLs for the type. The list is rounded off by a list of identified vulnerabilities and information gleaned from the web server. Again, you can expand the list to display the URLs for the findings.

For each URL, Skipfish adds a link labeled show trace, which displays the HTTP session in a separate window (Figure 4). In contrast to reports generated by other tools, Skipfish doesn't give too much of an explanation about its findings. You need to be aware of the dangers of XSS or SQL injection; alternatively, you can read the author's "Browser Security Handbook" [4]. Other tools provide commented reports that explain the vulnerabilities.

Figure 4: To evaluate a vulnerability that Skipfish has identified more accurately, Skipfish will display the complete HTML session in a separate browser window, if needed.

New Standard Tool

Skipfish's strengths are its speed, thanks to the implementation in C, and the dictionary concept, which is really useful for retesting the same application or for application self-teaching. Compared with other tools, such as Nikto or W3af, Skipfish still lacks some functions - for example, the ability to manipulate cookies, test for LDAP injection (see the "OWASP Top Ten" box), or targeted parameters to individual tests that can then be run individually. However, the project is still under development, and this is a very dynamic process in Skipfish's case.

As with any penetration-testing tool, you can't assume that the application you tested is secure just because Skipfish failed to return interesting results. But if Skipfish does complain about something, you should identify the source of the complaint and take it seriously - doing so will help you hook even the slipperiest of vulnerabilities.

OWASP Top Ten

The OWASP Top Ten [5] lists the worst crimes in web application programming. The current list, from 2010, enumerates the attack patterns that most frequently cause server and application intrusions.

Injection. SQL or LDAP injections in which the attacker uses a client to inject values into a query. For SQL, say, ' or 1=1 so that the resulting SQL request is always true.

Cross-Site Scripting. Users can enter JavaScript code in an input box, such as a form. A lack of input validation or escaping means that all other users using the page can see the code and execute it in this context.

Broken Authentication and Session Management. A logged-in user's session can be hijacked.

Insecure Direct Object References. The HTML code contains links to direct objects such as files or a database key the attacker can use to force the web server to perform unintended operations.

Cross-Site Request Forgery. The attacker sends a link to the victim application with request parameters specially crafted by the attacker to, for example, set up a new administrator account.

Security Misconfiguration. This refers to generic errors in the web server or application configuration that allow unauthorized access.

Insecure Cryptographic Storage. A poorly selected storage location for web server or application keys.

Failure to Restrict URL Access. Checks that direct links in the application do not work to ensure that attackers cannot access internal data.

Insufficient Transport Layer Protection. The web server uses weak crypto-agorithms for Transport Layer Security.

Unvalidated Redirects and Forwards. The application forwards input data without any validation. The values input by the attacker can redirect an unsuspecting user to another website without the victim noticing.

INFO

[1] Skipfish: http://code.google.com/p/skipfish/
[2] Nikto: http://cirt.net/nikto2
[3] W3af: http://w3af.sourceforge.net
[4] "Browser Security Handbook" by Michal Zalewski: http://code.google.com/p/browsersec/wiki/Main
[5] OWASP Top Ten: http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

AUTHOR

Konstantinos Agouros works for N.runs AG as a network security consultant and focuses on mobile networks. Konstantin's book, DNS/DHCP, was published by Open Source Press.