LJ Archive

Dynamic Class Loading for C++ on Linux

James Norton

Issue #73, May 2000

A technique for developers that will provide them with much flexibility in design.

Linux has much to offer as a development platform: a robust operating environment with tested tools. Linux also boasts implementations of just about every programming language available. I think it is safe to say, however, that among compiled languages, C is the language of choice for most Linux developers. Consequently, other languages such as C++ seem to be somewhat neglected in most discussions of Linux development.

The dynamic class loading technique provides developers with a great deal of flexibility in their designs. Dynamic class loading is a means of providing extensibility without sacrificing robustness.

In this article, we will design a simple application that defines a single class, a shape class we wish to use in a drawing package. As we shall see, dynamic class loading allows us to provide a smooth extension path through which users of the application can add new types of shapes without needing to modify the original application code.

Polymorphism

The basic idea behind dynamic class loading is the concept of polymorphism. Anyone familiar with C++ should be familiar with this concept, so I will discuss it here only briefly. In short, polymorphism is the ability of an object belonging to a derived class to act as an object belonging to the base class. This is the familiar “is a” relationship of OOP (object-oriented programming) parlance. For example, in the following code snippet, circle is a class derived from the base class shape (see Listing 1), so the object my_circle can act as a shape object, invoking the shape member function draw.

Listing 1

class shape { public:
   void draw();
};
class circle : public shape { };
int main(int argc, char **argv){
   circle my_circle;
   my_circle.draw();
}

While this has all the usual advantages, e.g., code reuse, the real power of polymorphism comes into play when draw is declared to be virtual or pure virtual, as follows:

class shape{ public:
   virtual void draw()=0;
};
class circle : public shape { public:
   void draw();
}
Here, circle has declared its own draw function, which can define behavior appropriate for a circle. Similarly, we could define other classes derived from shape, which provide their own versions of draw. Now, because all the classes implement the shape interface, we can create collections of objects that can provide different behavior invoked in a consistent manner (calling the draw member function). An example of this is shown here.
shape *shape_list[3];   // the array that will
                            // pointer to our shape objects
shape[0] = new circle;  // three types of shapes
shape[1] = new square;  // we have defined
shape[2] = new triangle;
for(int i = 0; i < 3; i++){
   shape_list[i].draw();
}
When we invoke the draw function for each object on the list, we do not need to know anything about each object; C++ handles the details of invoking the correct version of draw. This is a very powerful technique, allowing us to provide extensibility in our designs. Now we can add new classes derived from shape to provide whatever behavior we desire. The key here is that we have separated the interface (the prototype for shape) from the implementation.

While this technique is very powerful, it does suffer from a drawback in that we must recompile (or at least relink) our code when we add new derived classes. It would be more convenient if we could simply load in new classes at runtime. Then, anyone using our code libraries could provide new shape classes (with new draw functions) without even needing our original source code. The good news, and the subject of this article, is that we can.

dlopen and Dynamic Class Loading

While C++ has no direct mechanism under Linux for loading in classes at runtime, there is a direct mechanism for loading C libraries at runtime: the dl functions dlopen, dlsym, dlerror and dlclose. These functions provide access to the dynamic linker ld. A complete description of these functions is provided in the appropriate man page, so they are presented here only briefly.

The prototypes for the functions are as follows:

void *dlopen(const char
void *dlsym(void *handle, char *symbol);
const char *dlerror();
int dlclose(void *handle);

The dlopen function opens the file given in filename so that the symbols in the file can be accessed via the dlsym function. flag can take one of two values: RTLD_LAZY or RTLD_NOW. If flag is set to RTLD_LAZY, dlopen returns without attempting to resolve any symbols. If flag is set to RTLD_NOW, dlopen attempts to resolve any undefined symbols in the file. Failure to resolve a symbol causes the call to fail, returning a NULL. dlerror can be used to provide an error message explaining the failure. The dlsym function is used to obtain a pointer to the functions (or other symbols) provided by the library. handle is the pointer to the thing being referenced, and symbol is the actual string name of the item referenced, as it is stored in the file.

Given that we can use these functions to access functions in a C library, how do we use them to access classes in a C++ library? There are several problems to overcome. One is that we must be able to locate the symbols we need in the library. This is trickier than it might seem because of the difference between the way symbols are stored in C and C++ files. Secondly, how can we create objects belonging to the classes we load? Finally, how can we access those objects in a useful manner? I will answer these three questions in reverse.

Since we do not have the prototypes for the classes we load dynamically, how can we access them in our code? The answer to this lies in the description of polymorphism in the preceding section. We access the functionality of the new classes through the common interface provided by their base class. Following the examples above, any new shape classes would provide a draw function that would allow an object of that class to render itself.

Fine; we can use pointers to the base class to access objects from the derived classes. How do we create the objects in the first place? We don't know anything about the classes that might be loaded, other than the fact that they conform to the shape interface. For instance, suppose we dynamically load a library that provides a class called hexapod. We can't write

shape *my_shape = new hexapod;

if we don't know the class name ahead of time.

The solution is that our main program doesn't create the objects, at least not directly. The same library that provides the class derived from shape must provide a way to create objects of the new class. This could be done using a factory class, as in the factory design pattern (see Resources) or more directly using a single function. To keep things simple, we will use a single function here. The prototype for this function is the same for all shape types:

shape *maker();

maker takes no arguments and returns a pointer to the constructed object. For our hexapod class, maker might look like this:

shape *maker(){
   return new hexapod;
}
It is perfectly legal for us to use new to create the object, since maker is defined in the same file as hexapod.

Now, when we use dlopen to load a library, we can use dlsym to obtain a pointer to the maker function for that class. We can then use this pointer to construct objects of the class. For example, suppose we want to dynamically link a library called libnewshapes.so which provides the hexapod class. We proceed as follows:

void *hndl = dlopen("libnewshapes.so", RTLD_NOW);
if(hndl == NULL){
   cerr << dlerror() << endl;
   exit(-1);
}
void *mkr = dlsym(hndl, "maker");

The pointer to maker must be of type void *, since that is the type returned by dlsym. Now we can create objects of the hexapod class by invoking mkr:

shape *my_shape = static_cast<shape *()>(mkr)();
We are required to cast mkr to a pointer to a function returning shape * when we invoke it.

Some readers may see a problem with the code as written thus far: the dlsym call is likely to fail because it cannot resolve "maker". The problem is that C++ function names are mangled to support function overloading, so the maker function may have a different name in the library. We could figure out the mangling scheme and search for the mangled symbol instead, but fortunately there is a much simpler solution. We need only tell the compiler to use C-style linkage using the extern "C" qualifier, as shown in Listing 2.

Listing 2

Autoregistration

Loading the maker functions into an array associates a position in the array with each maker. While this may be useful in some cases, we can obtain more flexibility using an associative array to hold the makers. The Standard Template Library (STL) map class works well for this, as we can then assign key values to the makers and access them via these values. For example, we may desire to assign string names to each class and use these names to invoke the appropriate maker. In this case, we can create a map such as this:

typedef shape *maker_ptr();
 map <string, maker_ptr> factory;

Now when we want to create a particular shape, we can invoke the proper maker using the shape name:

shape *my_shape = factory[
We can extend this technique to make it even more flexible. Rather than loading the class makers in and explicitly assigning a key value to them, why not let the class designers do the work for us? Using a little bit of ingenuity, we can have the makers register themselves with the factory automatically, using whatever key value the class designer chooses. (There are a couple of warnings here. The key must be of the same type as all the other keys, and the key value must be unique.)

One way to accomplish this would be to include a function in each shape library that registers the maker for us, and then call this function every time we open a shape library. (According to the dlopen man page, if your library exports a function called _init, this function will be executed when the library is opened. This may seem to be the ideal place to register our maker, but currently the mechanism is broken on Linux systems. The problem is a conflict with a standard linker object file, crt.o, which exports a function called _init.) As long as we are consistent with the name of this function, the mechanism works well. I prefer to forego that approach in favor of one that will register the maker simply by opening the library. This approach is known as “self-registering objects” and was introduced by Jim Beveridge (see Resources).

We can create a proxy class used solely to register our maker. The registration occurs in the constructor for the class, so we need to create only one instance of the proxy class to register the maker. The prototype for the class is as follows:

class proxy {
public:
   proxy(){
      factory["shape name"] = maker;
   }
};

Here, we assume factory is a global map exported by the main program. Using gcc/egcs, we would be required to link with the rdynamic option to force the main program to export its symbols to the libraries loaded with dlopen.

Next, we declare one instance of the proxy:

proxy p;

Now when we open the library, we pass the RTLD_NOW flag to dlopen, causing p to be instantiated, thus registering our maker. If we want to create a circle, we invoke the circle maker like so:

shape *my_circle = factory["circle"];
The autoregistration process is powerful because it allows us to design the main program without having explicit knowledge of the classes we will support. For instance, after the main program dynamically loads any shape libraries, it could create a shape selection menu using all the keys registered in the factory. Now the user can select “circle” from a menu list, and the program will associate that selection with the proper maker. The main program does not need any information about the circle class as long as the class supports the shape API and its maker is properly defined.

Listing 3

Listings 1 through 5 pull together the concepts presented thus far. The shape class defined in Listing 1 is the base class for all shapes. Listings 2 and 3 are the source code for dynamically loadable libraries that provide circle and square shapes, respectively.

Listing 4

Listing 4 is the main program that is extensible through dynamically loaded libraries. It scans the current directory for any .so files (libraries) and opens them. The libraries then register their makers with the global factory provided by the main program. The program then dynamically constructs a menu for the user with the shape names registered by the libraries. Using the menu, the user can construct shapes, draw the shapes constructed, or exit the program. Listing 5 is the makefile used to build the project.

Listing 5

Real-World Examples

Recently, I have had two occasions to use this technique. In the first case, I was developing a moving object simulation. I wanted to provide users with the ability to add new types of moving objects without having access to the main source. In order to accomplish this, I defined a base class called entity, which provides the interface definition for any moving object in the simulation. A simplified version of the entity prototype is shown below.

class entity {
private:
   float xyz[3];  // position of the object
public:
   activate(float)=0; // tell the object to move
   render()=0;  // tell the object to draw itself
};

Thus, all entities have at least a position in three-space, and all entities can draw themselves. Most entities will have many other state variables besides position and may have many other member functions besides activate and render, but these are not accessible through the entity interface.

New entity types can be defined, incorporating whatever motion dynamics the user desires. At runtime, the program loads all the libraries in a subdirectory called Entity and makes them available to the simulation.

The second example comes from a recent project in which we wanted to create a library that could load and save images of various formats. We wanted the library to be extensible, so we created a base image_handler class for loading and saving images.

class image_handler{
public:
   virtual Image loadImage(char *)=0;
   virtual int saveImage(char *, Image &)=0;
};

The image_handler has two public functions, used to load and save images, respectively. The Image class is the library's basic object type for images. It provides access to the image data and some basic image-manipulation functions.

In this case, we were not interested in creating multiple objects of each type of image_handler. Instead, we wanted one instance of each image_handler that would handle loading and saving images of that type. Rather than registering a maker for each handler, we created a single instance of the handler in its library and registered a pointer to it with a global map. The map is no longer a factory, since it does not produce objects per se; it is more of a generic image loader/saver. The keys used here were strings representing file extensions (tif, jpg, etc.). Because a file format can have one of several different extensions (e.g., tiff, TIFF), each handler registers itself multiple times with the global map, once for each extension.

Using the library, a main program can load or save an image simply by invoking the correct handler via the file extension:

map <string, handler, less<string>> handler_map;
char *filename = "flower.tiff";
char ext[MAX_EXT_LEN];
howEverYouWantToParseTheExtensions(filename, ext);<\n>
// after parsing "flower.tiff" ext = "tiff"<\n>
Image img1 = handler_map[ext]->loadImage(filename);
// process data here
handler_map[ext]->saveImage(filename, img1);

Conclusion

Dynamic class loading allows us to create more extensible and more robust code. By using dynamic class loading combined with well-thought-out base class designs, we provide users with a practical means of extending our code.

Resources

James Norton spent most of his adult life avoiding real life by hiding out in school, first at Florida State University and then at Tulane University. The good life ended when he was awarded a Ph.D. in Electrical Engineering through what could only have been some sort of clerical error. He currently does research and systems development for Newsreal, Inc. He can be reached by e-mail at jnorton4@home.com.

LJ Archive