Understudy is a software-based server clustering utility that implements load balancing and failover protection for Linux (Red Hat, Debian and Slackware), Solaris, Cobalt, FreeBSD and Windows NT.
Manufacturer: Polyserve Software
E-mail: info@polyserve.com
Price: under $1000 US, including support
Reviewer: Daniel Allen
Every host on a network has down time, from the coolest RaQ to the lowest NT 486. The job of keeping down time to a minimum falls to the system administrator. Various solutions are available, spanning the range of needs and budgets. One way is to use high-availability servers with Fiber-Channel RAID arrays, multiple redundant CPUs and power supplies and a transaction-oriented file system. The servers can be arranged behind $50,000 load-balancing and failover systems to swap out servers automatically upon failure.
A solution at the other end of the cost spectrum is running a backup server which is manually switched with the primary server when necessary. In this scenario, if a server fails unexpectedly, it can be many minutes or hours before the poor system administrator can make the switch. This solution is both inelegant and widely used in company networks.
A third way is “server clustering”, or making multiple servers appear to users as if they were the same server, for fault-tolerance and load-balancing purposes. Very interesting efforts are underway to offer completely Linux-based server clustering solutions. These include the open-source Linux Virtual Server, and other work being done by the High Availability Linux Project. These projects show great promise, and they may be the right answer for sites wishing to be close to the bleeding edge. However, small businesses need fully supported solutions that do not require substantial modification to their existing, possibly heterogeneous, networks. This is the gap which Polyserve hopes to fill with Understudy. As you will see below, I think it does the job nicely.
Understudy is a software-based server clustering utility that implements load balancing and failover protection for Linux (Red Hat, Debian and Slackware), Solaris, Cobalt, FreeBSD and Windows NT. It supports between two and ten heterogeneous servers in a cluster, all of which must be located on the same DNS subnet. Polyserve hopes to release a newer version soon that circumvents the single subnet requirement. A cluster of servers can provide any service, including web, mail, news or file sharing.
When a server goes down, it is marked inactive within the cluster and another server takes its place in seconds. When the server comes back up, it is immediately reintegrated into the cluster. By using Understudy in conjunction with a load-rotation scheme called “round robin DNS”, a site can also provide simple load balancing. Load balancing requires one additional IP address for each server in the cluster. Simple failover requires only one IP address for a “virtual host”, which is how users see the cluster.
Installation of the Red Hat Linux version was simple. After reading the release notes, I wouldn't expect major difficulties on other platforms. Understudy provides a “quick-start” white paper on their web site which is recommended reading, along with the white papers on web server specifics and on round robin DNS. They are easily understood if you have ever configured a web server or changed your DNS configuration.
I downloaded the RPM for the free 30-day trial and ran rpm as root to install it. After installing files, Understudy started its dæmon and reminded me to assign a password for the administration tool, which I did. I repeated this process on each of the four servers that would make up the cluster.
The four servers were administered remotely, so I could not run the graphical local administration tool, which requires X on the server. However, Polyserve also offers a graphical remote administration tool, available for either Red Hat or Windows 98/NT. I downloaded and installed the RPM on my local Debian system using Alien, the Debian RPM manager. There were no serious problems, although I needed to modify the startup script it created to properly point to the copy of the Java Runtime Environment (1.1.7) and the libraries it also installed. It filed everything away in /usr/local/polyserve with a startup script in /usr/bin.
Next, I set up my first cluster with failover protection using a pair of servers. This requires a single “virtual host”, which is simply an unattached IP address in the same subnet as the real hosts. This was a straightforward process, following the instructions in the Quick Start Guide. Firing up the graphical administration tool, it prompted me for a cluster IP and password. I supplied the IP of the first server and was presented with the main window (Figure 1). The main box, titled “Cluster status”, listed the name of the server I supplied, with the reassuring status of “OK”. The menus include “File”, “Cluster”, “Server Log” and “Help”. “Cluster” has the most interesting choices: “Add Server”, “Add Virtual Host”, “Add Service Monitor to Selected Host”, “Delete Selected Item” and “Update Selected Virtual Host”. I chose “Cluster --> Add Server” and was prompted for a server name or IP. I filled in my second server. Voilà: the “Cluster status” told me both servers were okay. So far, so good.
Now, to add my first “virtual host”. This requires adding a new host in your DNS tables (such as in /var/named on your DNS server):
virtual1 60 IN A 150.1.1.1
This simply adds a new host name with a Time To Live (TTL) of 60 seconds with its Address.
I added this new line with an appropriate IP address for my subnet, and restarted the named dæmon. Back in the administration tool, I selected “Add Virtual Host”. It prompted me for the name or IP of the virtual host, and listed selection boxes to determine which real server was to be the primary server and which was to be the backup. I entered my information.
At this point, the Cluster status looked a bit more interesting (Figure 2). It listed both real servers, and subheadings described that the first server was Active for the virtual host, and the second server was Inactive. I tried to telnet to the virtual host. It connected me to the first server. I went back to the administration tool and deleted the virtual host. I re-added it, but this time, decided that the second server would be the primary server for this virtual host. The display reflected the change immediately. I telnetted to the virtual host. Sure enough, it connected me to the second server.
What's happening behind the scenes is something like this: Understudy runs as a dæmon on each server. The IP address of the virtual host is automatically aliased to the primary server. A small amount of traffic is constantly passed between the real hosts, via broadcast ARP messages. Through the dæmon, each host knows which is acting as primary. When the primary server goes down, the backup immediately reassigns the virtual host's IP address to itself. It continues to listen, so it can release the IP address when the primary server comes back up.
Note that Understudy will not allow you to use an IP address already assigned or aliased to a real host as a virtual host. I imagine that otherwise it would be easy to “hijack” the IP address of someone else's host in your subnet.
Next, I set up a “service monitor” (Figure 3). This allowed me to choose particular ports to monitor, such as for mail, web, FTP or TELNET. If the active server does not respond at that port, the inactive server will step in. I selected HTTP, and the Cluster Status reported that the web server was up on both real servers. I verified, using Lynx, that requests to the virtual host went to the primary server, unless a service it was monitoring was down, in which case requests went to the backup server. In all cases, Lynx showed the URL of the virtual host name, as expected.
For the next test, I set up round robin DNS. Round robin DNS is a feature built in to name servers such as BIND (versions 4.9 and up). Round robin allows servers to share loads transparently by rotating between any number of IPs for a given host name. The only problem is that no correction is made if one or more servers go down, so out of every cycle of requests, some are sent to a dud server. With Understudy, this is no longer a problem. You can set up round robin DNS for a number of virtual hosts, where each virtual host has a different primary. If any server is down, its requests are sent to the next secondary. Full examples for doing this are available in the Understudy documentation. These instructions were reasonably clear and easy to follow. At the conclusion of a couple hours of work, I had a fully redundant set of servers with no interruption to existing services on the servers.
One final function of the administration tool is a server log, which accesses dæmon messages for each server in a cluster. This brings me to a minor complaint: the logs are somewhat difficult to parse. It would be nice to see an integrated cluster log, providing a summary of the server logs.
The instruction manual states that all messages between the remote console and the servers are signed for security. The remote console will work with a firewall, and the servers will record a message to the log if somebody tries to “replay” internal UDP messages to try to confuse the servers.
To use Understudy in a production environment, you will want to configure any services (such as web, FTP, mail, TELNET) to respond to the virtual IP addresses (as well as the real IP addresses). There are complete instructions on adding the Virtual Hosts to your Apache or Microsoft IIS web servers.
Understudy does not automatically mirror information from one server to another, although Polyserve has stated that is a goal for a future version. You should think about whether servers will need to have up-to-the-second data copies, and plan accordingly. Some database applications might require extra hardware, such as a RAID array connected to multiple servers. I would visit the Linux High Availability web site, at http://linux-ha.org/, for LAN mirroring ideas.
Understudy can be downloaded and demoed free for 30 days, during which time technical support via e-mail is also free. It is trial-ware; during the trial period, the dæmon will turn itself off after two hours of use, requiring you to restart the dæmon. A permanent license with a service contract costs a little under $1000. Without the service contract, the price is roughly half as much. Polyserve has various support options, so you should contact them for a complete listing. I have good things to say about their customer service. Owing to network problems related to the Understudy software (but not the fault of the software), I spent a fair amount of time talking with Polyserve's support staff on the phone. They were technically competent and very helpful in pointing me toward a good solution to my problems.
The documentation is downloaded from the site in PDF form. It is complete and useful, although the product manual shows a few signs of poor editing. Unlike the white papers, the manual incorrectly states that you can use it with only two servers, not ten.
For further help, there is also a six-page help facility which describes the program's operation. For some reason, on my computer the help pages kept throwing Java exceptions. However, the information was still accessible, and they were minor distractions. These were the only bugs I found in the program.
Understudy should be a godsend for the beleaguered system administrator, server farm or ISP that needs to have services up 24/7 through reboots, failures and planned outages. One strong point in favor of this software is its ability to work in any network with all kinds of hosts. Even if your backup server is a 33MHz 486, Understudy can keep your network limping along until you can fix the primary server. It seems to be a good solution for those who cannot afford $10,000 to $50,000 for a dedicated failover and load-balancing server, or simply are not willing to pay a $2000 license for each server in the cluster.