Testing may not be fun, but it's important; Open SourceDevelopment Lab and Scalable Test Platform want to help.
The Open Source Development Lab (OSDL) is a nonprofit company working to enhance Linux scalability and telco capabilities. OSDL sponsors (www.osdlab.org/sponsors) have financed a full-scale test and development lab, complete with terabytes of storage and an array of SMP servers with anywhere from 2 to 16 CPUs. At the lab we provide developers with full access to enterprise-class machines via remote login.
We have been working with developers on the creation and execution of their tests. During this process, we have noticed a number of things that have to be done again and again for each test that comes through the lab. We listed the tasks that went into running an average test sequence and found a great deal of the process involved human interaction that could be automated. The Scalable Test Platform (STP) is the result of our attempt to automate the testing process from request to report.
Benchmarking itself has inherent concept problems that are outside both the scope of this article and the scope of the Scalable Test Platform effort. There are, however, solvable problems with current testing practices, and that is what the STP attempts to address. Please keep in mind, the benchmarking we focus on is completely different from methods used to get marketable benchmark numbers.
The configuration of a testing environment is rarely as well documented as it should be. Documentation on the setup of systems used in tests is usually limited to what the tester believes is relevant to their specific research goals. This lack of detail will cause problems later on, when other analysts are examining the report. It is not uncommon for an analyst to have to duplicate an entire test sequence to get the data required to answer questions that come up later. It is also common practice for a testing setup to be only partially automated. The resulting human interaction at undocumented moments will also affect the repeatability of the results.
Performance testing can require massive resources, both in the form of time and hardware. How many open-source developers can get access to 50 two-way client servers on a gigabit network in order to test a server farm made up of multiple 8-CPU servers and a 16-CPU server? Few companies would stretch to provide access to hardware like that and then only with a full entourage of managers and the potential revenue return to justify the expense. A good idea conceived by a developer without access to hardware like this is likely to remain unexplored.
Currently no central archive exists of well-documented results for performance, stability and standard compliance tests. Researchers are forced to run their own tests or pick and choose from mediocre results to come up with a less-than-accurate guess. System administrators have no central place to look for starter information on what combination of kernel, distribution and hardware tends to work well for a workload similar to what they anticipate. This lack of available research leads to confusion regarding the performance and reliability among the myriad of Linux choices.
Linux kernel developers cannot spend the time and effort required to run long performance and stability tests on their patches. Even if a developer is willing to spend the time testing a patch, testing software often requires a great deal of knowledge and specialized hardware just to install and configure. Occasionally this situation leads to problems being introduced into both the stable and development kernel trees. It also can allow problems solved previously to recur in future development but go unnoticed because of a lack of regression testing.
A number of developers have spoken up on the Linux kernel mailing list requesting a standard testing procedure for new patches. Many users and developers agree that a simple procedure, including performance, stability, standards compliance and regression testing, would benefit Linux kernel development.
While you can't test for every bug out there, you can check for common types of problems. It's generally not too difficult to add a regression test case to your testing suite after a bug is found and fixed. The problem is not in the creation of these tests. Most developers realize that it's a good idea to have a few synthetic tests available and very often do so. The problem is that most developers can't or won't take the time to configure a full range of verification tests. While coding can be fun, testing is often quite boring. If a developer could easily request a full test of their code and then continue working while someone else does the dirty work, we think they would be more inclined to attempt verification runs on their patches.
The STP is a hardware and software configuration for automated testing. To run the back-end control and scripting of the tests, we developed Brimstone, a combined batch control system and automated test harness. Requests are submitted using Eidetic, a user-friendly web front end, or brim-gate, the e-mail gateway. Using these front ends, entire test sequences can be requested in less than two minutes.
How STP works begins with developers checking patches in to our kernel CVS tree and requesting a test sequence using their patch. After the tests are completed, detailed results are returned to the developer via e-mail and are also archived on our web site. To simplify the process even further, a developer could write a short shell script that, in less than five lines, would check their patch into CVS and submit a preformatted test request via e-mail. Then all it would take to check the effectiveness of their patch would be a single command. Everything involved in a full-scale test run would then be taken care of without a second thought.
The hardware dedicated to the STP by the OSDL includes a 1.8TB storage away network setup connected to each server (four-CPU and up), via multiple Fibre Channel connections. Servers include two each of 2-CPU, 4-CPU and 8-CPU boxes, as well as a single 16-CPU IBM NUMA-Q server. A second 16-CPU NEC AzusA server, containing Itanium CPUs, is on order. We also have over 50 client-load machines that can be moved into the STP at the press of a key. We are also looking into the possibility of including a few single-CPU machines to ensure kernel modifications don't adversely affect the vast majority of current installs.
Eidetic, Brimstone and the e-mail gateway are all under the GPL, so interested parties can use them when setting up their own labs for specialized testing interests.
The first step is to go through the free sign-up to become an OSDL Lab Associate, available at www.osdlab.org. Next, enter a test request through the web page, which will involve something like this: choose the kernel tag to run (2.4.8 for instance), choose the distribution to use, choose the test to run, list the CPU details, list the various hardware restrictions (optional) and enter an optional LILO command line (allows for restriction on the RAM used). After submitting the base items, you need to spend a moment filling out the setup page for the test you selected, then submit the final request.
It's as simple as that. Depending on the length of time required to complete the type of test requested, you could have your response back in less than 25 minutes. Full environment documentation and the resulting data sets will be archived on the web site for you. Short tests will take at least 15 minutes because a fresh copy of the OS is installed prior to every test sequence.
Since the entire process is automated, testing does not stop when our office closes. This means the quick response will be possible at any time, for developers located anywhere in the world.
Since the STP is currently in the initial rollout phase, the number of scripted tests is still low. At the time of this writing, the following is a sample list of possible workloads: Juan Quitela's “memtest” VM abuse test suite, dbench (Samba) filesystem punishment, the ever-popular scripted kernel compile, simulated real-world CVS punishment and lmbench.
A long list of potential tests including scenarios involving multiple servers and applications such as Apache and MySQL is currently being evaluated. Of course, as an open-source project, we welcome assistance in the automation of these tests. Getting a full range of tests ready for use with the STP is going to be a major undertaking, but we believe the benefits to Linux will be worth it. For me personally, this is where I hope to give something back to the Linux community that will make a positive impact.
We are also in active cooperation with developers from SGI and IBM in the Linux Test Project (LTP). One current goal is to enlarge the LTP's coverage to include both targeted and general workload simulation tests. The LTP kernel features test is almost ready to be automated, which will provide us with a large regression test suite, as well as a solid base for stability research. The LTP's future plans include the development of a number of self-contained tests that will make great testing targets for the STP.