Where are the potential bottlenecks in scaling a Web app, and what can you fix?
In my last article, I started discussing how to optimize a Web application, looking at the different aspects of an app and where the slowness might come from. I described several different factors that can lead to a Web application feeling slow for the end user: network speed and latency, server application speed, client-side loading and rendering, and finally, the execution of the client-side JavaScript programs. Each of those factors contributes to the feeling of speed that a user gets when using a Web application, and it's our job as engineers to try to reduce them.
So in this article, I cover a number of the places where you can look to try to reduce the problematic times that I described—that is, specific areas of your application that might be slow and might well need some tuning or improvements. In my next article, I'll look at specific programs and techniques you can use to identify problematic spots in your applications and, thus, help them improve.
One of the most infuriating things someone can say to a programmer is “It doesn't work.” What doesn't work? What did it do yesterday? What causes the computer to stop working? When people say to me that their system “doesn't work”, and then expect me to find the problem, I suggest that such a statement is similar to going to a doctor's office and saying, “It hurts me somewhere.”
Similarly, if you're going to find the bottlenecks and slow-moving parts in your system, you're going to need to look at various parts of it and consider how much they might be contributing to the slowness. Only after thinking about the individual parts and understanding what about each of them might be slow, can you then try to fix any problems you might have.
I'm not a hardware guy; I often joke that I reach my limits when I change a light bulb. That said, there's no doubt that understanding your hardware, at least a bit, is going to help you optimize your software.
It used to be that the major consideration for software developers was the speed of the CPU on which programs ran. However, for modern Web applications, I/O speed and memory are far more important.
Your application—especially if it's written in a high-level language, and even more so when the application is under heavy load—will consume a large amount of memory. As a result, you're going to want to load up on RAM. This is also true for your database; the best way to kill a database's performance is to start using virtual memory. If your server is using the disk because it ran out of RAM, you're going to suffer from a massive performance hit.
By contrast, if you can give your database a large amount of memory, it will be able to cache more data in RAM and not have to go to disk nearly as often. If you're using PostgreSQL for a database, setting the “shared buffers” and “effective cache size” parameters are crucial, in that they tell the database how much memory it can expect to use. This helps the PostgreSQL server figure out whether it should jettison data that it already has cached or load additional data into the shared buffers.
This also points to the advantages of having multiple servers, rather than a single server. If you're running a small Web site, the RAM requirements of your HTTP server and your database server might well clash. This especially will manifest itself when you get a lot of traffic. Your HTTP server will use more memory, but so will your database, and at a certain point, they might collide. Using more than one computer not only allows you to scale out the Web and database servers more easily, but it also ensures that neither will interfere with the other.
The other factor to consider, as I mentioned above, is I/O. Again, I'm not a major hardware expert, but you want to consider the speed of the disks that you're using. Today, as people increasingly move to ritualized servers in which the underlying hardware is abstracted away, it's sometimes hard to evaluate, let alone choose, the hardware on which your systems are running. Even if you can't do that though, you can and should try hard to avoid putting any production-level system on a shared machine.
The reason is simple. On a shared machine, the assumption is that each application will play nicely with the others. If one application suddenly starts to crunch through the disk a great deal, everyone's I/O suffers. I experienced this when working on my PhD dissertation. My software would back up its database once per hour, and the managers of my university's computer cluster told me that this was causing unacceptably slow performance for other users. (We found a way to back up things in a way that was less resource-intensive.)
The question that many people used to ask about servers was “Buy or build?”—meaning, whether you should create your own special-purpose server or buy one off the shelf. Today, very few companies are building their own servers, given that you're often talking about commodity hardware. Thus, the question now is “Buy or rent?”
I must say that until recently, I was of the opinion that having your own server, on which you have relatively full control, was the way to go. But I must admit that after working on several scalability projects, I'm warming up to the idea of deploying a swarm of identical VMs. Each individual VM might not be very powerful, but the ability to scale things up and down at the drop of a hat can more than outweigh that need.
The bottom line is that when you're looking into servers, there are (as always) many different directions to explore. But if you think that your system might need to scale up rapidly, you should seriously consider using a “cloud” platform. More important than the CPUs is the amount of RAM and ensuring that your virtual machine is the only user of your physical machine.
Oh, and how many servers should you set up? That's always a hard question to answer, even if you know how many users you expect to have visit. That's because servers will behave differently under different loads, depending on a wide variety of factors. No matter what, you should give yourself a comfortable margin of error, as well as have contingency plans for how to scale up in the case of wildly successful PR. In my next article, I'll discuss some strategies for figuring out how many servers you'll need and what sort of margin to leave yourself.
Now that you have some hardware, you need to consider your HTTP server. This is a matter of taste, personal preferences and considerable debate, and it also depends in no small part on what technology you're using. For many years, I have used Apache httpd as my Web server—not because it's super fast, but because it's very easy to configure and has so many plug-in modules available. But, even I must admit that nginx is far more scalable than Apache. The fact that Phusion Passenger, a plugin for both Apache and nginx, works with both Ruby and Python and is trivially easy to install convinced me to switch to nginx.
Whereas Apache uses multiple threads or processes to handle its connections, nginx does so using a single thread, in what's known as the “reactor pattern”. As such, it's generally much more scalable.
If you're trying to remove potential bottlenecks in your system, then having a high-powered HTTP server is going to be necessary. But of course, that's not enough; you also want the server to run as quickly as possible and to be taxed as little as possible.
To make the server run faster, you'll want to examine its configuration and remove any of the modules you don't really need. Debugging is great, and extra features are great, but when you're trying to make something as efficient as possible, you're going to need to trim anything that's not crucial to its functioning. If you're using Apache and are including modules that make debugging easier and faster, you should remove them, keeping them (of course) in your development and staging systems.
You also want to tax your HTTP server as little as possible, so that it can concentrate its efforts on servicing HTTP clients. There are a number of ways to do this, but they all basically involve ensuring that requests never get to the server. In other words, you'll want to use a variety of caches.
Within your application, you'll want to cache database calls and pages, so that someone who requests a page on your system turns to the HTTP server only if it's really necessary. Modern frameworks, such as Rails and Django, are designed to let you cache pages in an external system, like memcached or Redis, such that requesting /faq from your server will be served from the cache instead.
Beyond the caching that you'll want to do in your application, you probably will want to put a front-end Web cache, such as Varnish, between your servers and the outside world. In this way, any static asset (such as JavaScript, CSS or images) that users request will come from that cache, rather than having to go to the server.
Going one step further than this, in a move that a growing number of large sites are making, you could (and probably should) use a content distribution network (CDN), on which your static assets reside. In this way, someone who goes to your site hits your server only for the dynamic parts of your application; everything else is served by a third party. Your server can spend all of its time worrying about the application itself, not all of the stuff that makes it pretty and functional for the end user.
Another point of great debate, as well as a bottomless pit (or, if you're a consultant, an endless opportunity!) of work, is that of databases. Whether you're using a relational database, a NoSQL database or a combination of the two, all databases are massively complex pieces of software, and they need a great deal of configuration, care and attention.
Actually, that's not true. You can, in many cases, get away with not tuning your database configuration. But if you want to scale your system up to handle high load levels, you're going to want to keep as much of the database in memory as you can and tune your queries as much as possible.
You're also going to want to ensure that the database is doing the best job it can of caching queries and then of results. Databases naturally do this as much as they can; keeping query results in memory is a way to boost the speed. But many databases let you configure this memory and also tune the ways in which it is allocated. PostgreSQL, for example, uses its statistics (collected via VACUUM) so that it'll know what should stay in memory and what can go. But at the application level, you can cache queries and their results, allowing the application to bypass the database completely and, thus, lighten its load.
There is another complication when it comes to databases, namely the fact that so many modern Web application frameworks generate SQL for you automatically using object-relational mappers (ORMs). It's true that in most cases ORM-generated SQL is more than adequate for the task, and that the overhead of generating SQL on the fly and of the many object layers required to do so, is worth the cost.
But, there are numerous cases in which ORM-generated SQL was not very efficient, often because the programmer's assumption didn't match that of the ORM. A classic example of this is in Ruby on Rails if you retrieve numerous objects from the database after a query. From the Ruby code, it feels like you're just iterating over a large number of objects. But from the SQL side, each object over which you iterate is firing off SQL queries, potentially clogging the database.
Using a slow-query log or logging (as PostgreSQL allows) all queries that take longer than a certain minimal threshold is a great way to start looking for the things that are taking a long time in the database.
But even then, you might not find your performance problems. I recently was working with clients, helping them optimize their database, which we found was taking a very long time to execute a lot of queries. And yet, when we looked in the logs, nothing was written there. The problem wasn't that each individual query was taking a long time, but that there was a huge number of small queries. Our optimization didn't speed up their query, but rather replaced a “for” loop, in which the database was queried repeatedly, with a single, large query. The difference in execution speed was truly amazing, and it demonstrated that in order to debug ORM problems, it's not enough to know the high-level language. You really do need to know SQL and how the SQL is being executed in the database.
In some cases, there's nothing you can do to prevent the application from hitting the database—and hitting it hard with lots of queries. In such cases, you might well want to use a master-slave model, in which all of the read-only queries are directed to a set of slave (read-only) databases, and write queries are directed to a master server. Master-slave configurations assume that most of the queries will be reading, not writing, and if that's the case for your system, you're in luck.
If that's not the case for your database, and master-slave doesn't do the trick, your solutions will be limited by the database you're using. Some offer master-master replication, giving you multiple databases to which you can send write requests. Some of them, especially in the NoSQL world, shard automatically. But no solution is perfect, and especially when it comes to master-master replication, things can get a bit dicey, even for the most experienced database hands.
The bottom line is that your database is quite possibly going to be the biggest bottleneck in your system. Try to keep the application away from it as much as possible, caching whatever you can. And, try to keep the database tuned as well as possible, using the current best practices for the system you're using, so that it'll be working at peak performance.
In my last article, I talked about the basic, high-level places where there might be problems with scaling a Web application. Here, I discussed, in greater detail, what sorts of things you can do when your application is showing, or might be showing, scaling issues. Next month, I'll look at some specific tools and techniques that you can use to understand how scalable your application is before hordes of users come and use it.