by Carlo Kopp
The largest and most complex parametric simulation performed on the Monash CSSE MPME cluster was done as part of a research project dealing with microwave propagation performance effects on large ad hoc microwave networks.
Ad hoc networks are a category of mobile network in which no fixed infrastructure is used; every mobile system in the network acts as a router on behalf of its peers. The topology of such networks is dynamic; it changes continuously as vehicles within the network drift in and out of range of one another. It also changes as a result of weather impairing the propagation of microwave links between these vehicles.
Simulating this behaviour is very demanding computationally. Finding a link between two stations requires the solution of a nonlinear differential equation to account for atmospheric refraction. This is performed using a multistep polynomial method to solve an initial value problem. The shooting method is then employed iteratively to find the solution of the refractive equation. Once we know the shape of the curve, we must perform an atmospheric loss calculation which requires computation of losses due to oxygen resonance, water-vapour resonance, and scattering in cloud and rain, all performed for very small increments of beam altitude above the surface of the earth.
The complete simulation entails loading a database of geographical positions of vehicles in the network from disk, a search through this database in which every vehicle is tested against every other to discover the existence of links, and the creation of a link database. This operation is essentially proportional to N<+>2<+>, where N is the number of vehicles in the network.
Once we have a database of links, we use this population to solve an all-paths/shortest-path problem to find the shortest route connecting every pair of vehicles in the ad hoc network. This compute load is proportional to N<+>3<+> and is also time-consuming. We may then extract these routes and look at properties such as the number of hops between specific vehicles or specific ground stations in the network, and aggregate behaviour such as the net availability of routes over time as the topology of the network varies.
Needless to say, this is a very demanding simulation. On a 333MHz Pentium II CPU, it takes hours to complete for a population of hundreds of nodes, and several days to complete for populations of thousands of nodes.
The general problem is one of exploring the behaviour of the network, using different frequencies for the microwave links, different weather conditions and different possible link speeds. Because the system we are trying to model is so complex, only a comprehensive simulation across a large range of input parameters yields a trustworthy solution.
In the early phases of this project, I envisaged doing this on a single top-end Pentium II processor. When confronted with the sheer volume of computation, I had to craft a simplified version of the intended simulation which, in effect, cheated and used a very simple model to handle the propagation behaviour. Professor David Abramson proposed that I employ the cluster instead—I was initially very sceptical and resisted. However, after much argument, David prevailed, and I ported the simulation from FreeBSD to Linux and developed the EnFuzion plan scripts to run it.
It was quickly apparent that having 20 CPUs allowed me to grind through the volume of computations 20 times faster. Once I gained confidence with the cluster, I incorporated the accurate propagation model and expanded the scope of the simulation to map out the whole range of frequencies, link speeds and weather models ideally required to solve the complete problem. At the time of this writing, the project is virtually complete, and about 950 simulation runs have been performed.
Use of the cluster allowed me to demolish the performance bottleneck which had plagued me in the traditional world of single-processor simulations. The simple scripting language used by EnFuzion allowed me to organize the parameter space and efficiently conduct the simulations without having to sit over the machine, waiting for one job to complete in order to fire off the next one.
To say we did not have teething problems would be untrue. The simulation had to be thoroughly debugged, and some adaptations were required to turn off the X11-based screen display features. The biggest problems we faced initially were managing the load on the system, and tweaking the Red Hat Linux installations to achieve the required reliability.
Solving the load management problem required some brainstorming. Executing long-running jobs of this complexity, across all 20, later 28 and finally 60 CPUs in the cluster, tended to discourage casual daytime users who were confronted with a load-saturated system. After being spammed by a mob of irate users, I proposed to David that a scheduler was required. David suggested that the UNIX renicing mechanism might be usable; after some debate on implementation, our system administrator crafted a cron job to do this. The irate users were pacified, and the simulations continued successfully.
The biggest source of difficulty was the lack of robustness in the Red Hat networking protocol stack running on the root node. This was eventually solved. For very large clusters, the root server must be rock-solid reliable.
In summary, the use of the cluster converted a would-be simulation nightmare into a relatively straightforward exercise. Because the simulation did not lend itself to vectorization, a Cray Y-MP was not a viable solution to this dilemma, budgetary issues aside. In the end, the problem was solved by using a pair of 19-inch racks loaded with commodity Pentiums running freeware Red Hat Linux.
After my initial scepticism, it was a case of eating humble pie—the boss did know better, after all! Clustering works very effectively, and for problems which don't vectorize easily, it may be the only viable solution.
A native of Perth, Western Australia, Carlo Kopp graduated with first-class honors in Electrical Engineering in 1984, from the University of Western Australia. In 1996 he completed an MSc in Computer Science and has recently submitted a PhD in the same discipline, at Monash University. He has consulted in UNIX systems programming, performance engineering and system administration. He now lectures computer architecture at Monash University.