Picking the RapidMind

Nicholas Petreley

Issue #163, November 2007

RapidMind has a mind to make advances in the use of multicore programming rapidly available.

Writing applications to support multiple CPU cores is not an easy task, and in some cases, it is even harder if you want to take a huge existing application and adapt it for multiple cores. So I figured the real breakthrough is likely to be years away. It seems as if RapidMind has a solution for this problem that doesn't require a massive overhaul of an existing application, and its solution is already available.

RapidMind's Founder and Chief Scientist Michael McCool (left) and President and CEO Ray DePaul (right)

We invited RapidMind's President and CEO Ray DePaul and Founder and Chief Scientist Michael McCool to talk about RapidMind's approach to exploiting the power of multicore systems.

We deemed it important to look at RapidMind, because it seems as if we're finally entering the age of parallel processing on the desktop as chip manufacturers bump up against the practical limits of Moore's Law. Everything from graphics cards to PlayStation 3 consoles exploit parallel processing these days. I have an Intel quad-core processor in my workstation. Although I'm happy with it, I find that the only time I truly appreciate having this multicore chip is when I run multiple applications simultaneously or run multiple processes, such as with the command make -j 5. If anything, single-threaded applications run slower on this chip than on the single-core CPU I used to run, because each core in the Intel chip is significantly slower (2GHz vs. 3GHz).

So how does RapidMind bridge the gap between existing software and the changing computational model?

LJ: Could you give us a brief description of RapidMind, and the problem it is designed to solve?

DePaul: RapidMind is a multicore software platform that allows software organizations to leverage the performance of multicore processors and accelerators to gain a real competitive advantage in their industry. With RapidMind, you can develop parallel applications with minimal impact on your development lifecycle, costs and timelines. And, we allow you to accomplish this without the need for multithreading. You leverage existing skills, existing compilers and IDEs and take advantage of all key multicore architectures without constantly porting your application.

LJ: So is it accurate to say RapidMind is actually a library of common C/C++ operations, where the exploitation of multiple cores is largely transparent to the programmer?

McCool: RapidMind is much more than a simple library of “canned functions”. In fact, it is possible to use the API to the RapidMind platform to specify an arbitrary computation, and for that computation to execute in parallel with a very high level of performance. We provide a sophisticated multicore software platform that can leverage many levels of parallelization, but at the same time allows developers to express their own computations in a very familiar, single-threaded way.

LJ: How much, if anything, does the programmer need to know about parallel processing programming techniques in order to use RapidMind?

McCool: We believe that developers are the application experts and should have some involvement in moving their applications into the parallel world. The key is to let developers leverage what they already know, rather than force them down an unfamiliar and frustrating path. RapidMind is built upon concepts already familiar to all developers: arrays and functions. It is not necessary for a developer to work directly with threads, vectorization, cores or synchronization. Fundamentally, a developer can apply functions to arrays, and this automatically invokes parallel execution. A RapidMind-enabled program is a single-threaded sequence of parallel operations and is much easier to understand, code and test than the multithreaded model of parallel programming.

LJ: Can you give us a simple code example (the includes and declaration statements that would start a typical program)?

McCool: First, you include the platform header file and optionally activate the RapidMind namespace:

#include <rapidmind/platform.hpp> using namespace rapidmind;

Next, you can declare variables using RapidMind types for numbers and arrays:

Value1f f; Array<2,Value3f> a, b;

The Value1f type is basically equivalent to a float, and the Array types are used to manage large collections of data. These can be declared anywhere you would normally declare C++ variables: as members of classes or as local or global variables.

A Program object is the RapidMind representation of a function and is created by enclosing a sequence of operations on RapidMind types between RM_BEGIN and RM_END. The operations will then be stored in the Program object. For example, suppose we want to add a value f, represented using a global variable, to every element of an array. We would create a program object prog as follows:

Program prog = RM_BEGIN {
  In<Value1f> c; Out<Value1f> d;
   d = c + f;
} RM_END;

Note that although the program may run on a co-processor, we can just refer to external values like f in the same way we would from a function definition. It is not necessary to write any other code to set up the communication between the host processor and any co-processors.

To apply this operation to array a and put the result in array b, invoking a parallel computation, we just use the program object like a function:

b = prog(a);

Of course, in real applications, program objects can contain a large number of operations, and a sequence of program objects and collective operations on arrays (such as scatter, gather and reduce) would be used.

LJ: How do you avoid the common pitfalls of parallel processing, such as deadlocks or other synchronization issues?

McCool:The semantics of the RapidMind interface does not involve explicit locking or synchronization by the developer. The platform itself automatically takes care of these issues when necessary at a lower level in the runtime platform. The developer cannot specify programs that deadlock or that have race conditions, in the same way that a Java developer cannot specify programs that have memory leaks.

LJ: I see Hewlett-Packard software ran 32.2 times faster after the software was adapted to use RapidMind. How long did it take to modify the software to use RapidMind?

McCool: Our collaboration with HP was a great test of our platform. Roughly the same amount of time was taken to RapidMind-enable the application as was taken by HP to tune its single-core baseline version. The tuning by HP sped up its version by a factor of 4, whereas RapidMind running on an NVIDIA 7900 GPU outperformed that by a factor of more than 32. More recently, we have run the same code on an NVIDIA 8800 GPU and sped it up by an additional factor of 5, and we also have run the RapidMind version on our multicore CPU quad-core product and achieved a speedup of 8 over HP's version.

So the benefit to the software organization is quite startling. For the same effort, you can use RapidMind not only to get significantly higher performance on the same multicore processors you're already targeting, but you can leverage the additional performance of accelerators as well. The RapidMind version also will scale automatically to future processors with more cores.

LJ: Is the speed increase in the HP software typical or “best case”? What software is most likely to see speed increases? Database server software? Complex queries on data warehousing? Spam filtering? Web browsers? Something else?

McCool: We have seen large speedups on a wide range of applications, including database operations, image and video processing, financial modeling, pattern matching and analysis, many different kinds of scientific computation—the list goes on and on. The RapidMind platform supports a general-purpose programming model and can be applied to any kind of computation. The HP test was compute-bound, and it could take advantage of the high compute performance of GPUs. However, in memory-bound applications, we have also seen a significant benefit, over an order of magnitude, from running the application on RapidMind. RapidMind not only manages parallel execution, it also manages data flow and so can also directly address the memory bottleneck. As a software platform company, we are constantly surprised by the variety of applications that developers are RapidMind-enabling. Prior to the launch of our v2.0 product in May 2007, we had more than 1,000 developers from many different industries in our Beta program. The problem is industry-wide, and we have developed a platform that has very broad applicability.

LJ: Shouldn't this kind of adaptation to multiple cores take place in something more fundamental like the GNU C Library? Is it only a matter of time before such libraries catch up?

McCool: Simply parallelizing the standard library functions would not have the same benefit, because they do not, individually, do enough work. RapidMind programs, in contrast, can do an arbitrary amount of user-specified parallel computation.

Although RapidMind looks like a library to the developer, it's important to realize that most of the work is done by the runtime platform. The challenge facing multicore developers is not one that can be solved solely with libraries. Developers need a system that efficiently takes care of the complexities of multicore: processor-specific optimization, data management, dynamic load balancing, scaling for additional cores and multiple levels of parallelization. The RapidMind platform performs all of these functions.

LJ: You support multiple platforms on different levels. For example, you can exploit the processors on NVIDIA and ATI graphics cards, the Cell processor, as well as multicore CPUs. In addition, you support both Linux and Windows, correct?

DePaul: The processor vendors are delivering some exciting and disruptive innovations. Software companies are faced with some tough choices—which vendors and which architectures should they support. By leveraging RapidMind, they get to benefit from all of the hardware innovations and deliver better products to their customers within their current development cycles and timelines. RapidMind will continue to provide portable performance across a range of both processors and operating systems. We will support future multicore and many-core processors, so applications written with RapidMind today are future-proofed and can automatically take advantage of new architectures that will likely arise, such as increases in the number of cores.

LJ: Can you tell us more about your recently demonstrated support for Intel and AMD multicore CPUs?

DePaul: It's difficult to overstate the value we bring to software companies targeting Intel and AMD multicore CPUs. For example, at SIGGRAPH in San Diego, we demonstrated a 10x performance improvement on an application running on eight CPU cores. RapidMind-enabled applications will scale to any number of cores, even across multiple processors, and will be tuned for both Intel and AMD architectures. Software organizations can now target multicore CPUs, as well as accelerators, such as ATI and NVIDIA GPUs and the Cell processor, all with the same source code.

LJ: Is there anything else you'd like to tell our readers?

DePaul: It's becoming clear that software organizations' plans for multicore processors and accelerators will be one of the most important initiatives they take this year. Companies that choose to do nothing will quickly find themselves behind the performance curve. Companies that embark on large complex multithreading projects will be frustrated with the costs and timelines, and in the end, largely disappointed with the outcome. We are fortunate to be partnering with a group of software organizations that see an opportunity to deliver substantial performance improvements to their customers without a devastating impact on their software development cycles.

LJ: Thank you so much for your time!