Speeding up Database Access with mod_perl

Reuven M. Lerner

Issue #52, August 1998

Continuing the discussion of mod_perl, Mr. Lerner tells us about the DBI specification and the Apache::DBI module.

Last month, we started to look at mod_perl, a module for the Apache web server that puts a copy of Perl inside of our HTTP server. Not only does mod_perl save us the overhead of forking a new process and invoking Perl each time a CGI program is run, but it caches programs once they are compiled, reducing start-up time even further.

As with everything else in life, there are trade-offs for using mod_perl. CGI programs can be written in any language and can be run by any web server on any platform. By contrast, mod_perl only works with Apache (which runs under most UNIX versions and is about to be released for Win32), and requires that programs be written in Perl. If you are interested in maximum portability, you might consider sticking with CGI. But if you are like me and spend most of your time working with Apache and Perl anyway, you might seriously consider moving toward mod_perl.

However, this is not the end of the story. Even if mod_perl were infinitely faster than CGI's fork-and-execute method, a major bottleneck would still remain: opening a connection to a database server.

If your web application talks to a relational database server, it has to open a connection before it can send a query. That initial dialogue between your program and the database server can take quite a while, both because of the initial TCP connection that must be established and because database servers were not designed for single-query connections. Many servers expect you to connect once, send a number of requests, receive a number of responses and disconnect when done. Web applications, by contrast, tend to open a connection for each query, which can unnecessarily slow your programs.

Luckily for us, the folks who wrote mod_perl created a module designed to solve this problem. Apache::DBI, as it is called, takes advantage of mod_perl's variable persistence (i.e., the fact that variable values are kept from one invocation of a program to another) to keep database connections open. Each invocation of a program using Apache::DBI has only to send a query to the database and act on the returned response.

This month, we look at the DBI specification in general, and the Apache::DBI module in particular. We also take a quick look at the Benchmark and LWP modules, which can help profile code in general, and which will allow us to see just how much faster mod_perl and Apache::DBI can make our programs.

Configuration

To get mod_perl working with Apache, you will need to compile and install the latest versions of both programs. (Full instructions are available in last month's “At the Forge”.) To take advantage of all of Apache::DBI's features, you will need to compile mod_perl with more than the default options. These options can be specified by adding OPTIONNAME=1 on the same line as the initial invocation of Makefile.PL, the first stage of the compilation and installation process.

I found it easiest (if a bit wasteful) to compile using the EVERYTHING flag, which turns on all of the mod_perl options, even those that are unnecessary for Apache::DBI. To do this, type the following in the initial mod_perl directory:

perl Makefile.PL EVERYTHING=1

The rest of the mod_perl and Apache configuration continues as described last month. See the INSTALL document that comes with mod_perl to learn how to turn on only those features you need, rather than using EVERYTHING=1.

Apache::DBI works automatically with programs that use mod_perl. In other words, any program that uses Apache::Registry (more or less the mod_perl equivalent of CGI) automatically gets the benefits of Apache::DBI, assuming that the latter is specified in the configuration file (described below). You can configure a directory to use Apache::Registry in much the same way as you can configure it to use CGI, with directives in the srm.conf file such as:

# Deal with mod_perl
<Location /perl-bin>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
</Location>

You can then instruct mod_perl to load Apache::DBI in all directories using Apache::Registry by inserting the following directive into srm.conf:

PerlModule Apache::DBI

Now you will need to restart your server, most likely while logged in as root, by typing:

# killall -v -1 httpd

The server is now ready to talk to the database using mod_perl. Before we can take advantage of Apache::DBI, though, we will need to investigate DBI a bit more carefully.

What is DBI?

There are dozens (perhaps hundreds) of databases on the market, some of which run under Linux. One product which I have used, both in my own consulting work and in the pages of this column, is MySQL, a “mostly free” database (in the author's words) distributed by TcX DataKonsult AB. The “Resources” sidebar contains information on where you can download the source and binaries for MySQL.

The CGI programs we wrote in our previous encounters with MySQL used the Mysql module for Perl, which gives us access to all of MySQL's features. Mysql.pm continues to work just fine for most applications.

However, if you are interested in keeping up with the latest standards and trends within the Perl community, you should switch (as I have) to DBI, the generic database interface for Perl programs. DBI allows you to use the same code on any number of databases. That is, you can write a program that talks to MySQL, Sybase, Oracle or any other database product—and you will have to change only one word in order to port the program to another database product. This makes Perl a very powerful and portable database-access language.

DBI is divided into two parts: the generic DBI module engine, which can be downloaded from CPAN, the Comprehensive Perl Archive Network, and a DBD (database driver) for the particular brand of database you wish to access. There is only one DBI module, but there is a different DBD for each database you might wish to access. (See “Resources” for information on where you can obtain DBI and DBDs.) If you plan to use more than one database server, you will have to install more than one DBD. You can install as many DBDs as you want; they are installed in parallel, and thus do not conflict.

If you are using MySQL, then you will have to download the driver for Msql, the database on whose interface the MySQL API was modeled. Before it compiles and installs the necessary modules, the Msql-modules package asks whether you want to install DBDs for Msql, MySQL or both.

Using DBI

Once you have installed DBI and the appropriate DBD, you will be able to do just about everything you would normally expect from a database. The syntax is a bit different from the syntax we have seen in previous installments of “At the Forge”, but is conceptually quite similar. It should not take you very long to start using DBI, once you have seen some examples.

Connections to the database are kept in a database handle, normally stored in a variable called $dbh. The database handle not only gives you a compact, object-oriented way to access database methods, but also means that you can connect to multiple databases at the same time, giving each connection its own database handle. (This might be useful when moving information from Sybase to Oracle, for example.)

The basic syntax is fairly straightforward:

$dbh = DBI->connect($data_source, $username,
        $password);

As you can see, the connect method takes three arguments. The first, $data_source, defines the database you wish to access, as well as the name of the computer on which the server sits and the access port on that computer. The second two arguments are theoretically optional, but most configurations will (and should) require them.

For example, most test programs on my home computer use the following syntax:

$dbh = DBI->connect("DBI:mysql:test:localhost");

Because I use the unprotected “test” database, no user name or password is necessary. A production site on which user names and passwords are required would use syntax like the following:

$dbh = DBI->connect("DBI:mysql:classifieds:dbserver",
        "classy" "51haf3");

In the above line of code, we are again connecting with the MySQL DBD. But this time, we are connecting to the database named classifieds on a machine named dbserver, with the username classy and password 51haf3. Remember that user names and passwords on database systems are unrelated to user names and passwords on UNIX systems. For security purposes, you should use different passwords (and perhaps even different user names) with your databases than are actually in use on your system.

If the connection succeeds, $dbh can be used as an entry point into the database. If the connection fails, $dbh remains undefined. This allows us to use the following error-checking code:

&log_and_die($DBI::errstr) unless $dbh;

The &log_and_die routine is one of my old favorites (and probably familiar to long-time readers of this column), printing an error message on the screen and then exiting gracefully. Complete listings including the subroutine &log_and_die are available at ftp://ftp.linuxjournal.com/pub/lj/listings/issue52/2991.tgz.

Now that we are connected to the database server, we can feed it one or more queries in SQL, Structured Query Language.

If we want to insert a value into a table in the database, we can simply say something like:

$sql = "INSERT INTO test_insert (contents) VALUES
(\"$random\") ";

Putting the SQL query into a scalar variable before using it is not required, but helps if and when you need to debug the code. (This way, you can easily add a “print” statement in the middle of your program.)

With the query all set, we tell the database to perform the requested action using the do method. This returns a variable which I call $successful_insert; much like $dbh, $successful_insert is defined only if the query was successful:

$successful_insert = $dbh->do($sql);
print "<P>Success!</P>\n" if $successful_insert;

Finally, we disconnect from the database. This is not completely necessary, since Perl will close connections when we are no longer using them. Nevertheless, it is always a good programming practice to clean up:

$dbh->disconnect;

The above syntax is good for any SQL query from which we do not expect a result, namely INSERT, DELETE and UPDATE. (For more information about SQL, see Resources for some book recommendations.)

Retrieving Rows with SELECT

If we want to retrieve matching rows from the database, we need to modify the syntax just a bit. After all, we expect to not only receive a report on whether the database was able to perform our requested action, but also see the results.

Assuming we are connected to the database, we set $sql to our SQL query:

$sql = "SELECT id,contents FROM test_insert";

We then use the prepare method to send our query, as follows:

$sth = $dbh->prepare($sql);

The $dbh->prepare result is known as a “statement handle”, which is traditionally named $sth. Just as $dbh allows us to perform operations on the database to which we have connected, $sth allows us to perform operations on the statement we have just sent. And just as $dbh is undefined in the case of an error, so too is $sth:

&log_and_die($sth->errstr) unless $sth;

Assuming that $sth was sent to the database successfully, we tell the database to execute our query, checking for problems with the return code:

$sth->execute || &log_and_die($sth->err);

Now comes the fun part, namely iterating through each of the rows returned to us. We can find out how many rows were returned as a result of our query by checking the value of $sth->rows. We can then retrieve each of the returned rows, one by one (with one column value per row), using $sth->fetchrow. When there are no more rows to retrieve, $sth->fetchrow returns false, which means that we can use it within a “while” loop. Indeed, this is a fairly standard idiom in the DBI world:

# Loop through returned rows
while (@row = $sth->fetchrow)
{
# Grab the columns from the row
$id = $row[0];
$contents = $row[1];
# Print the ID and the contents
print "<P>$id:\"$contents\"</P>\n";
}

When we are finished with this statement, we use the finish method associated with the statement, which is analogous to the disconnect method for the database handle:

$sth->finish;

Now that we have reviewed all of this in theory, let's put it into practice. First, we will create a small table in the “test” database in MySQL, by running the mysql client program:

mysql test

Once we see the mysql> prompt, we can create our small test table:

CREATE TABLE test_insert
 (id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY
        KEY, contents VARCHAR(50) NOT NULL,
        UNIQUE (contents));

The above defines our table, test_insert, to have two columns. The first column, id, is defined to contain an unsigned integer. The integer, whose presence is mandatory (NOT NULL), is automatically incremented every time we insert a row into the table and can be used as a unique index into the table. The second column, contents, is a variable-length character string whose presence is mandatory (NOT NULL) and which cannot be repeated in another record (UNIQUE).

The CGI program in Listing 1 demonstrates all of the above, first inserting a number of rows into the table and then retrieving them. Most DBI programs are as simple as this one, although many either store or retrieve information, rather than do both.

One of the problems with a standard like DBI is that the interface follows the least common denominator. That is, there are differences between database packages in addition to their administration and speed; just about every package includes a number of non-standard SQL commands and features in order to differentiate itself from the competition. If you are interested in using such features, you might have to use the func DBI method, which enables proprietary database extensions. Of course, doing so means that your program is no longer portable to other databases, which might be a concern if you switch to another vendor.

Moving the Program to Apache::DBI

We have now compiled Apache to use mod_perl, configured a perl-bin directory for serving mod_perl programs and configured Apache to insert the Apache::DBI module for all programs within the perl-bin directory. We are all set to take our sample DBI program and use it with mod_perl.

How must we modify the program in order to get it to work with mod_perl? Actually, we needn't make any modifications at all, if we have configured our copy of Apache as described above. All we need to do is copy our program into perl-bin, set the appropriate permissions and give it a whirl. Here's what I wrote on my computer, for example:

~httpd/cgi-bin% cp dbi-demo.pl ../perl-bin/
~httpd/cgi-bin% chmod ug+x ../perl-bin/dbi-demo.pl

I changed the URL in an open browser window such that it pointed to perl-bin rather than cgi-bin, and—voilà--it all just worked.

When I first started to use mod_perl and Apache::DBI, I wasn't sure how much faster programs would run. The execution certainly seemed faster, but I wasn't sure how much of an improvement I was seeing. I decided to use Perl's Benchmark module, comparing the execution speed of two different programs. I would try to insert 100 random text strings into a database, first using a CGI program and then using an Apache::DBI version of the same program (which, as we now know, simply means a version of the program placed in the perl-bin directory).

Benchmarking is a tricky and subtle business, and there are undoubtedly factors which I neglected when calculating these results. Even so, they seem to bear witness to the amazing performance difference between CGI and mod_perl. I'm sure if I were to spend a great deal more time working on my Apache and/or MySQL configuration, I could get even better performance out of my lowly 75 MHz Pentium running Red Hat 4.2. However, the relative numbers should speak for themselves.

First, let's examine the test I performed. I used the same test_insert table in MySQL as we saw before. I then wrote a CGI program, similar to the one we saw before, which connects to the database and inserts a random value into the contents column. The resulting program is shown in Listing 2.

How Fast is it?

Now that we have our test program, how can we actually test it? I wrote a short Perl program using Benchmark.pm, in Listing 3. The timethese function is imported by Benchmark.pm, which we bring in at the beginning of the program. We also bring in LWP::Simple, part of the “Library for WWW access in Perl” that makes it a snap to write a small web client. How simple? Well, the following one-liner returns the HTML contents at http://www.linuxjournal.com/:

perl -e 'use LWP::Simple;
print get "http://www.linuxjournal.com";'

Perl does not format the output for you. That's the difference between a web browser and a web client; the former is meant to retrieve information for humans, while the latter is meant to retrieve information for programs. In this particular case, we just want to simulate 100 retrievals of each of our programs via the web. Any timing differences will thus be due to the program on the server side—which, since they are identical, means that the differences will be due to mod_perl and Apache::DBI.

How much faster is the Apache::DBI version than its CGI counterpart? Here are the results I got running time-db.pl:

[1086] ~% ./time-db.pl
Benchmark: timing 100 iterations ...
Apache::DBI: 24 secs (1.77 usr 0.67 sys = 2.44 cpu)
Plain CGI: 394 secs (1.10 usr 0.61 sys = 1.71 cpu)

That's quite a difference. When I first ran this benchmark, I was convinced that the plain CGI program had somehow gotten stuck. Alas, that was not the case; the overhead associated with CGI was simply too great.

Here is a second run of the same benchmark, just for comparison.

[1099] ~% ./time-db.pl
Benchmark: timing 100 iterations ...
Apache::DBI: 28 secs (1.89 usr 0.61 sys = 2.50 cpu)
Plain CGI: 355 secs (1.15 usr 0.62 sys = 1.77 cpu)

Yes, it looks like CGI is indeed much slower. By the way, you can see that Apache::DBI used more CPU time than plain CGI—which means that the time was spent forking the new Perl process, rather than performing our program's computations.

What if we take out the Apache::DBI directive in srm.conf and restart the server? That would give us an indication of how much overhead was being used opening the database connection. As you can see, things do indeed slow down—although admittedly not by a huge amount:

[1104] ~% ./time-db.pl
Benchmark: timing 100 iterations ...
Apache::DBI: 34 secs (1.97 usr 0.63 sys = 2.60 cpu)
Plain CGI: 460 secs (1.19 usr 0.60 sys = 1.79 cpu)

The moral, then, seems to be that moving from CGI to mod_perl gives a huge performance boost, and that moving from DBI to Apache::DBI gives a moderate performance boost. The more database accesses your web applications do, the more useful these technologies will probably be in your work. Perl has always been known as a useful language, but rarely as one that can help you write fast software. Now, with mod_perl and Apache::DBI, you can write web applications quickly, and watch them run quickly as well.

Resources