At the Forge

nginx

Reuven M. Lerner

Issue #266, June 2016

You've probably heard about the nginx HTTP server, but have you tried it? It's easier than you think, and worth a look.

Engineers love to think that they make decisions based on pure logic and merit. But of course, everyone has biases in terms of programming languages, editors and other technologies—biases that probably can be defended in technical terms, but that often come down to an emotional argument as much as a technical one. (Except in the case of Emacs, of course, which is clearly the best editor by all objective standards.) The problem with such biases is that they can cause people to make choices and decisions that feel comfortable, but aren't necessarily right.

Case in point: I've been using the Apache HTTP server for many years now. Indeed, you could say that I've been using Apache since before it was even called “Apache”—what started as the original NCSA HTTP server, and then the patched server that some enterprising open-source developers distributed, and finally the Apache Foundation-backed open-source colossus that everyone recognizes, and even relies on, today—doing much more than just producing HTTP servers.

Apache's genius was its modularity. You could, with minimal effort, configure Apache to use a custom configuration of modules. If you wanted to have a full-featured server with tons of debugging and diagnostics, you could do that. If you wanted to have high-level languages, such as Perl and Tcl, embedded inside your server for high-speed Web applications, you could do that. If you needed the ability to match, analyze and rewrite every part of an HTTP transaction, you could do that, with mod_rewrite. And of course, there were third-party modules as well.

Things got even better through the years as the Web got larger, and Web sites were expected to do more and more. Scalability became an important issue, and Apache handled it with (not surprisingly) a variety of modules that implemented different back-end schemes. You could have the traditional mix of processes, or use threads, or combinations of the two.

Beyond the flexibility, it was clear that Apache httpd was well maintained, well documented and stable. Installation was easy, upgrades were easy—really, everything was easy.

So, it's no surprise that Apache always has been my first choice when it comes to HTTP servers. And yet, I always knew in the back of my mind that I really should spend more time checking out other options. In particular, one alternative stood out—nginx.

Whereas Apache was primarily designed to be modular, nginx was designed to be fast—really fast. Moreover, it was designed to be fast when dealing with large numbers of simultaneous requests. This is thanks to its approach to networking, which is diametrically opposite to Apache's. Apache httpd allocates one new process per incoming HTTP connection. Thus, if there currently are 1,000 simultaneous connections to your Web site, there will be 1,000 Apache processes running on your computer. If you're using multiple threads, you can expect to have 1,000 separate threads servicing those 1,000 requests.

nginx takes the opposite approach, using a single process and no threads. This means that in nginx, those 1,000 simultaneous connections would be handled by one process, rotating through each of those connections to see if there is data to be sent or received. This “reactor” pattern of designing network software has become popular lately, with node.js and event-driven additions to Python 3.5 demonstrating the interest in this way of writing code.

So yes, nginx is fast. And it's even modular, although the modules cannot be added dynamically, as in the case of Apache. Rather, they must be compiled into nginx in order to use them. For this reason, adding and removing features from nginx, although certainly possible, is less flexible than is the case with Apache, which doesn't require recompilation.

In this article, I go through the basic installation and configuration of nginx to get a simple Web application running. In so doing, you'll see how the configuration differs from Apache, both in style and in execution, and how you need to think if you're going to use nginx.

Installation

Years ago, if you wanted to install nearly any open-source software, you needed to download a .tar.gz file, open it, modify the configuration, compile it and install it. Today, of course, you can install things on a Linux box running Debian or Ubuntu with a simple apt-get command. For example, I can install nginx as follows:

apt-get install nginx

But, wait a second. If nginx cannot be modified after I compile it, perhaps I should check to see how I can modify the configuration I'll get from the default installation. And of course, while you can change the server configuration, you cannot change the modules that are compiled into the server. So making sure that the right modules are compiled into nginx is pretty important before installing it.

On the Ubuntu 14.04 server I used for testing, running apt-cache search nginx revealed the following options:

nginx-extras
nginx-full
nginx-light

Which one is appropriate for you, or should you try something else? The answer, of course, depends on what you want to do.

If you want to serve static files, any of these will do just fine. Even nginx-light, the smallest of the bunch, has features like SSL, gzip and rewriting built in to it. Indeed, nginx-light even includes fastcgi, the module you'll need if you want to run a program like WordPress.

But, let's say you want to deploy Ruby on Rails applications, using the Phusion Passenger add-on. Which version of nginx should you install to run that? The answer, quite simply, is “none of them”. nginx will need to be recompiled in order to install Passenger. This is, oddly enough, not as painful as you might expect. However, it does mean that before you even can decide how to install nginx, you need to consider what you want to do with it.

Static Pages

Let's start exploring nginx by installing the nginx-lite package under Ubuntu, then looking at the configuration and how you can get a basic static site running.

First, I'm going to install the nginx-core package:

$ sudo apt-get install nginx-core

I then can start the server with the fairly standard shell command:

$ sudo service nginx start

After a few moments, nginx will have started, as I can tell either by typing this:

$ sudo serviced nginx status

to which I get the response:

nginx is running

And if I go to the home page on my current server, I'm greeted by, “Welcome to nginx!”

But of course, I'd really like to have my own content there. Let's take a look at the configuration file, which is in /etc/nginx/nginx.conf on my system, and see how it's formatted and how to change it to make some custom static content.

Now, if you're used to Apache configuration files, the style of nginx's file is going to take some getting used to. Like Apache, each line contains a configuration setting in a name-value style. Unlike Apache, the sections are delimited using curly braces ({ }), and each line must end with a semicolon (;). For example, the first line in my installed, default nginx configuration file is:

user www-data;

This means nginx will run as the www-data user, which is pretty standard in the world of Ubuntu (and Debian). Next comes the configuration parameter:

worker_processes 4;

This describes how many processes nginx should launch when running. But, it would seem to contradict what I wrote above, namely that nginx uses only a single process (and no threads within that process) for extra speed, no? Well, yes and no—the idea is that you'll probably want to have one nginx worker process per CPU core on your server. On this server, I have four cores, each of which can (and should) have an nginx worker process. You can think of this as a one-computer version of a load balancer, distributing the load across the available CPUs. Each worker process can and will handle a large number of network connections.

If your server will be running more than just nginx—for example, if you are running a database server on the same machine—you likely will want to reduce this number, so that at least one core is always available for those other processes.

The default configuration file then contains an “events” section:

events {
        worker_connections 768;
        # multi_accept on;
}

In this, I set worker_connections—meaning, how many network connections can each worker process handle simultaneously? In this case, it's set to 768; I'm not sure where this number comes from, but it means that if my site becomes popular, I might find that I run out of network connections. You might well want to raise this number.

The multi_accept directive, which is commented out by default, is also set to “on” by default—meaning that nginx is willing to accept new connections as they arrive, handling more than one at a time. I can't think of a good reason to turn this off.

Next is an “http” section, which you won't be surprised to hear has to do with HTTP connections made to the system.

Most of these configuration directives aren't going to be of interest right away; as you can see, nginx's logging directives are similar to those in Apache and other servers:

access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;

Where is the location of the site defined? In the case of nginx, it's not directly within the “http” block. Rather, it's inside another configuration file—or more accurately, a set of configuration files for the sites configured on the server:

include /etc/nginx/sites-enabled/*;

Because I'm using a fresh installation of nginx on a computer that hasn't been used for other things yet, there is only a single server configured. You easily can imagine a situation in which a single computer is configured to work with dozens, or even hundreds, of different sites, each of which will have its own configuration file. In this case, however, I'll just work with the “default” server, defined here:

/etc/nginx/sites-enabled/default

This file starts with a “server” section, describing a single port on which nginx should be listening. This means if you want to listen on multiple ports—for example, on port 80 for HTTP and port 443 for HTTPS—you'll need to configure those in separate blocks. This “server” block opens with the following:

listen 80 default_server;

This means that it's going to be listening to port 80, and that this is the default server for the system. Consider a computer on which nginx is running, which is hosting several dozen sites using virtual hosts. Using default_server, you can tell nginx which site will accept requests for names that aren't otherwise claimed by another virtual host.

Finally, here are the two lines that tell nginx where to look for my files:

root /usr/share/nginx/html;
index index.html index.htm;

The root directive tells nginx in which directory to look. And the index directive indicates that if someone asks for the directory—in this case, the simple URL “/”—which file should be served.

So, I know that to modify my (current, default) static Web site, I need to edit the file /usr/share/nginx/html/index.html. And sure enough, if I look in that location on my server's filesystem, I see the “Welcome to nginx” file. By changing that file, I can change what my site looks like.

Using PHP

However, if I want to use a server-side language, I'm out of luck. As currently configured, nginx won't let me use PHP or anything else. If I simply rename the file to index.php and add a line of PHP inside of it:


<?php echo '<p>Hello World</p>'; ?>

then at best, I'll get the source file downloaded to my browser, without any execution of the PHP code. At worst, things will just fail.

So, let's figure this out a bit. First, if I'm going to use PHP, I'll need to install the language on my server. Note that installing the entire php5 package in Ubuntu then tries to install Apache as well, which is clearly not the goal here! Thus, I'll just install a few selected packages:

$ sudo apt-get install php5-cli php5-fpm

What's php5-fpm? That's for “FastCGI”, a standard that was established many years ago in order to cut down on the overhead of CGI (that is, external) programs that Web servers would run in order to create customized, dynamic pages. Rather than starting the external program once for each HTTP request, I'll start it only once, executing the already-started program each time an HTTP request comes in. I'll thus need to set up PHP to work with the FastCGI protocol.

This is done using a server, which you'll need to install and configure. The idea is that nginx will receive a request for a file containing PHP; it'll invoke PHP using FastCGI and then will return the program's output to the user's browser.

There are several ways to set up the FastCGI server. I used UNIX sockets, which allow two programs to communicate if they're both on the same server. You could instead use network sockets, in which case the FastCGI server could exist on a different computer from the nginx server, but for the example here, that's overkill.

In order for this to work, I'll need to modify the configuration for PHP's FastCGI implementation. The change that I made was in the file /etc/php5/fpm/pool.d/www.conf, which came with my PHP configuration. In this file, there is a (commented-out) line with the listen value. I set it to use a UNIX socket, as follows:

listen = /var/run/php5-fpm.sock

Once I had done that, I restarted the FastCGI server for PHP:

sudo service php5-fpm restart

That restarted PHP's FastCGI-compliant server, making it possible for nginx to talk to the server.

Connecting nginx to PHP

With that in place, I just need to tell nginx when to invoke the FastCGI server and how it can contact that server.

First, I changed the index line to look for the file index.php, by replacing the previous index line:

location / {
      index index.php;
  }

Now, when an HTTP request comes in for a directory, it'll serve up index.php.

Next, I needed to tell nginx that when it sees a file ending with a “.php” suffix to use FastCGI:

location ~ \.php$ {
    try_files $uri =404;
    include /etc/nginx/fastcgi_params;
    fastcgi_pass   unix:/var/run/php5-fpm.sock;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME 
 ↪/usr/share/nginx/html$fastcgi_script_name;
    }

The two most important lines here are fastcgi_pass, which must point to the socket file I've created, and fastcgi_param, which indicates where the FastCGI programs are to be located. In the above fastcgi_param directive, I'm indicating that files with a “.php” suffix in /usr/share/nginx/html will be executed in the right place.

Notice also the include line, which imports a huge number of directives having to do with FastCGI into the system. You can take a look at it, if you want, but I've been using FastCGI for many years and tend to treat many of the configuration options as something approaching black magic.

What's Next?

Now that you've seen that you can configure nginx with PHP, you can go in any of several directions. First, you could use PHP not only to create simple “hello, world” programs, but also to run real applications, such as those based on WordPress (which is written in PHP). Next month, I'll describe how you can connect nginx to WordPress for a robust and high-speed solution.

But, nginx can be used with languages other than PHP as well. Phusion Passenger, which I have discussed in the past, works not only with Apache, but also with nginx. The only issue is that because nginx must be recompiled when you add or remove (or update) a module, the installation can be a bit tricky.

The bottom line is that nginx, although it takes some getting used to for an old Apache user like me, turns out to be flexible, well documented and (of course) extremely efficient at handling Web traffic. If you're setting up a new Web server and think you might need to squeeze some more “oomph” out of your system, it's definitely worth looking into nginx.