A look at the possibilities of Pd and GEM for Linux-based audio and video.
As a multimedia-capable platform Linux has seen terrific growth over the past few years. At the system level, a simple kernel patch can now improve scheduler efficiency and bring performance latencies down to an incredible three milliseconds or less, well within the acceptable range for professional audio and video applications. Given the low-latency patch (along with some other fine-tuning), we can now consider the availability of applications capable of utilizing this enhancement.
Along with performance artist Chris Spradlin, I am currently working on a multimedia presentation that combines video playback/processing, still-image and video projection, and various forms of audio capture, playback and transformation. Controlling the interplay of the different media poses a considerable challenge, particularly because we want to do everything in real time and in Linux. Happily, I have found an excellent solution to our dilemma: Miller Puckette's remarkable Pd.
Pd (pure data) is an audio synthesis/processing environment similar to the famous Max (and its Java offspring jMax). These environments employ a neat scheme of graphically patching various simple components (such as signal generators/modifiers and their control objects) into complex sonic networks.
Figure 1 demonstrates Pd's basic principles: the osc~ signal generator creates an audio waveform (a cosine wave), the slider controls the frequency (pitch) of the waveform and the network around the dbtorms object modifies the amplitude (volume) of the generated signal. Finally, the modified signal is sent to the dac~ object (the digital-to-analog converter) and the results are heard through the audio system.
Pd includes a variety of ready-made objects for signal generation and processing, and if this kind of synthesis patching were all Pd could do, it would still be an impressive audio environment. However, thanks to Mark Danks' wonderful GEM OpenGL-based graphics library, Pd also can manipulate video and image parameters in real time. Pd's flexibility permits arbitrary connection and control between its audio and video streams, creating a powerful environment for controlling and coordinating multimedia presentations.
In order to use Pd with GEM, you must invoke it via the $HOME/.pdrc file or with a command string similar to this one:
pd -rt -lib /home/dlphilp/gem-0.87_2/Gem
The -rt option prioritizes Pd's performance to real-time status. When coupled with a low-latency kernel, Pd's performance is quite acceptable for live shows and other real-time circumstances. You'll want all the help you can get when you're running Pd with the GEM library; the kernel latency patch is a godsend, but you'll still need a hardware-accelerated OpenGL installation to make the best use of GEM.
Note: the test system for this article included an 800MHz Duron CPU, 256MB RAM and a Voodoo3 AGP video card. The Linux kernel version was 2.4.5, patched for low latency; the video subsystem was XFree86 4.0.1. Certain operations in GEM are very CPU-intensive, and I would qualify the Voodoo3 as the low end of acceptable video boards for the Pd/GEM alliance. The audio system included a Sound Blaster Live! sound card running under the ALSA 0.9.0beta10 driver. Note also that Pd version 0.34-4 (stable) was used along with IOhannes Zmoelnig's beta version of GEM 0.87. Previous incarnations of GEM do not include the Linux versions of the pix_movie and pix_film objects needed for the real-time video manipulations described here.
In Figure 2 we see the basic structure of a simple Pd patch utilizing the GEM library functions. Note that the gemwin and gemhead objects are required for all other GEM-related actions. This patch provides the mechanisms for loading a movie (anim-3.mov in this case) and playing it back while rendering it to the surfaces of a cube. The cube size is controlled by the slider movement, and the film can be started and stopped by clicking on the smaller Bang button (one of the two cyan-colored boxes). Figures 3-5 demonstrate this patch in action. The first snapshot (Figure 3) illustrates the simplest rendering, with one face of the cube displayed and expanded to fit most of the rendering window. Figure 4 shows how the rotate object manipulates the cube, and Figure 5 displays the curious effect that occurs when the slider value exceeds the rendering window size. Of course the still images can't convey the effect of the movie playing upon the surfaces of the animated cube, but trust me, it's very cool.
Now let's look at the possibility of combining our two example patches. Using straightforward cut/copy/paste editing, we easily can copy one patch's contents into another and start having some serious fun. Pd truly lives up to the promise of its name: data is purely data here, any data stream can be routed anywhere within a patch (with some restrictions), and we easily can set up a system in which one set of controllers simultaneously controls audio and video parameters.
Figure 6 illustrates our complex combined audio/video patch. As you can see, the two sliders each simultaneously control an aspect of the video along with an aspect of the audio. Adjusting the audio amplitude results in an adjustment of the cube size, while moving the slider for the audio frequency control also will rotate the cube on its x/y axes. Multimedia artists will find great possibilities in Pd's support for such arbitrary attachments and correspondences. I also should note that GEM includes numerous other fascinating pixel-based effects (such as color convolution and pixel threshold control), but I leave their exploration to the interested reader.
Basing our work upon the examples shown here, we are currently planning our presentation for a two-man show. We hope to use no more than two Linux-powered laptops (ideally we would need only one) and a variety of external devices (video recorder/player, still-image projector, lighting displays, etc.). Ease of transportation is a concern because we would like to be able to take the show on the road in a single vehicle.
Our Pd audio explorations have been quite stable in performance, which is good news because we plan to use Pd's real-time audio processing throughout the piece. GEM 0.87_2 sometimes crashes Pd, but I'm using a beta version as I write this article. IOhannes Zmoelnig is dedicated to GEM's improvement, so we reasonably can expect flawless performance by the time we are ready to mount our first performance (targeted for late 2002). We also have seen that combining heavy audio and video processing creates a need for more powerful hardware than we currently employ. I hope to improve our video situation with the addition of a GeForce3 video card. Finally, we also intend to use Pd's shoutcast~/oggcast~ objects for broadcasting our performances live over the Internet.
Pd is incredibly easy to work with, permitting fast construction of relatively complicated patches. The examples shown here were designed merely as learning tools and proof-of-concept demonstrations, and I have already created considerably more complex patches for our project. This article is only a shallow indicator of the possibilities of Pd and GEM, and I encourage all Linux-based audio and video artists to get involved with this software.
Vast thanks to Miller Puckette and Mark Danks for creating and freely distributing Pd and GEM. Thanks also to the Pd community for their continuing assistance to this perpetual newbie, and especially to IOhannes Zmoelnig of IEM for his beta version of GEM 0.87 and for his unstinting aid while I learn more about GEM's video objects. Finally, great thanks also must go to Guenter Geiger for his initial work on the pix_movie object and for his many other contributions to the Linux versions of Pd and GEM.