First in a series of articles detailing the creation of a Java interface to Postgres95.
Java's native methods are functions written in C (or another compiled language) and dynamically loaded by the Java interpreter at run time. They provide the means to access libraries that have not been ported to Java, and also allow fast compiled code to be inserted at critical points in your system.
In this article, we will walk through the complete process of writing native code. We will create a Java interface to Postgres95 by writing wrapper classes around the libpq library. Postgres95 is a free database system (licensed under the GPL) that runs on most varieties of Unix, including Linux
While written (and tested) solely on Linux, the principles of this article should apply to any version of Unix and (with the exception of how to build the shared library) the code should be easily ported. To get the most out of this article, you should have some Java experience, or be very familiar with C++ and OO principles.
Recently, Java has received a great deal of attention (and quite a bit of hype) as a fantastic WWW tool. “Java Powered” pages with animations and interactive interfaces have popped up all over the Web, and everyone, including Microsoft (gasp!), is clamoring to add Java capabilities to their browsers. What many people don't realize is that Java is much more than that: it is a complete programming language suitable for use in standalone and, in particular, client-server applications.
Java offers several features that make it ideal for an application language. First among these is obviously portability. With Java there is no need to write Windows95, Mac, and several Unix versions of your application. Since the code is run by the Java Virtual Machine (VM), all that is necessary is that the VM (and any native libraries you want to use) be ported to that platform.
Another compelling reason to write in Java is the depth of its libraries (“packages” in Java-speak): networking, I/O, containers, and a complete windowing system are all integrated. Many of these capabilities are “crippled” when running a Java applet, but applications are free to make complete use of all of them. Java is a multi-threaded environment, allowing safe use of threads on platforms that don't currently support them natively. Java has a garbage collection system that eliminates the need for explicit freeing of memory. Exception handling is built in (and its use is actually required by many of the libraries, including the one we will write), and its true OO nature eases inheritance and re-use.
Even with all these things going for it, using Java for an application still has one major drawback: many systems don't yet have a Java interface, and writing them from scratch is often difficult, or even impossible.
This is the problem I faced when I wanted to access a Postgres95 database from Java. There was an excellent (and simple) C library (libpq) that shipped with Postgres95, but no support whatsoever for Java. Since the source (in this case) was available, I considered recreating libpq under Java, but this proved to be a substantial chore, and required intimate knowledge of Postgres internals. (In fact, as of this writing, John Kelly of the Blackdown Organization is writing just such a beast. It's called the Java-Postgres95 project, and you can find an alpha version at ftp://java.blackdown.org/pub/Java.
Then I decided to simply write wrapper classes for libpq. There are several drawbacks to this approach: First, it cannot be used in an applet. Browsers explicitly disallow any access to native code (except those provided with the browser), so these classes simply will not work. Second (and more importantly), this solution is not as portable as one written in straight Java. While libpq is portable to all major flavors of Unix, and the code we'll write will be as well, there is currently no libpq for Windows95/NT or the Mac.
Apart from being simpler, there is one other advantage to writing this in native code: When the Postgres95 project releases bug fixes or changes their communication protocol, little or no change will be required to our code.
We will proceed in three steps, providing examples of how to use each part along the way.
First, we'll create wrappers for libpq's PGconn, and PGResult. This will allow us to connect to the database, issue queries, and process the results.
Then, we'll write a new interface to Postgres95's Large Objects (or blobs in other databases), using Java's Stream classes.
Finally, we'll use Java's threads to provide an easy, behind the scenes interface to Postgres95's asynchronous notification system.
Java methods (class functions) that have been declared “native” allow programmers to access code in a shared library. Theoretically, this code can be written in any language that will “link with C” (but in general, you'll probably want to stick to C, or perhaps C++).
When a Java class is loaded, it can explicitly tell the Java system to load any shared library (.sos in Linux) into the system. Java uses the environment variable LD_LIBRARY_PATH (and ldconfig) to search for the library, and will then use that library to resolve any methods that have been declared “native”.
The general procedure for writing native code is as follows:
Write the .java file, declaring all native methods as “native” (The .java file must compile cleanly at this point, so insert dummy methods if you need to)
Add the loadLibrary() command to your .java files to tell Java to load the shared library
Compile the class:
javac [-g] classname.java
Generate the headers and stubs:
javah classname (no extension)
javah -stubs classname
Use the declarations in the classname.h file to write your C code (I use the file classnameNative.c, as it seems popular, and the stubs file uses classname.c)
Compile the .c files using the -fPIC (position independent) flag:
gcc -c -fPIC -I/usr/local/java/include filename.c
Generate the shared lib (these flags are for gcc 2.7.0):
gcc -shared -Wl,-soname,libFOO.so.1 -o libFOO.so.1.0 *.o -lotherlib
Put the .so file somewhere in your LD_LIBRARY_PATH (or add it to /etc/ld.so.conf).
The PGConnection class is a wrapper for libpq's PGconn. A PGconn represents a connection to the backend Postgres95 process, and all operations on the database go through that connection. Each PGConnection will create a PGconn and keep a pointer to it for future use.
Let's walk through the steps above:
First, we write our PGConnection.java file (Listing 1). Remember that it must compile cleanly in order to generate our header and stubs, so if you refer to any Java methods that you haven't written, create dummy methods for them. We will need a constructor, a finalizer, and all of the operations that libpq allows on a PGconn. We declare most of these operations as native methods (see Listing 1—exec() and getline() are special cases that we'll consider later).
To get a PGconn, libpq provides the function:
PGConn *setDB(char *host, char *port, char *options, char *tty, char *dbName)
Since this in effect “constructs” the connection to the database, we'll use this as a model for our constructor (See Listing 1, line 18). The constructor simply calls connectDB() (Listing 1, line 21; a native method that calls setdb()—we'll define it in a moment), and throws an exception if the connection is not made. Doing the error checking in the constructor guarantees that no connection will be returned if the call to setdb () fails.
Now let's look at our first native method, connectDB(). We declare it as native at line 70 in Listing 1. Note that no Java code is provided.
There are several important things to notice about this declaration. The “private” keyword makes this method accessible only from the PGConnection class itself (we want only our constructor calling it). The “native” keyword tells Java that code from a shared library should be loaded for this method at runtime. Since libpq is not “thread-save”, we want to make it impossible for two threads to be making calls to libpq at the same time. Making all of our native methods “synchronized” goes a long way towards this goal (we will return to this when we tackle the asynchronous notification system). Finally (Listing 1, lines 70-73), the declaration states that connectDB() takes five Java strings as arguments and doesn't return anything.
The remainder of the native calls follow this same pattern, with the exception of exec() and getline(). Again, we'll put these off a little longer.
Before we continue, let's add the loadLibrary call. We place it at the end of the class, in a block marked “static” (Listing 1, line 92) with no method name. Any blocks such as this are executed when the class is loaded (exactly once) and libraries that have already been loaded will not be duplicated. In our example, we'll name the library libJgres.so.1.0, so we need to use loadLibrary (“Jgres”) (See Listing 1, line 94).
With our .java file complete, we are ready to write the C code. First, we compile the .java file with:
Then, we create the “stubs” file and the .h file with:
javah PGConnection javah -stubs PGConnection
At this point you should have PGConnection.h and PGConnection.c in your current directory. PGConnection.c is the “stubs” file, and should not be modified. For our purposes, the only thing you must do to the stubs file is to compile it and link it into your shared library.
PGConnection.h is a header file that must be included in any C file that accesses PGConnection objects. At line 14 (see Listing 2) you will find the declaration of a struct corresponding to the data for our object. Below that you will find prototypes for all of the native methods we declared. When writing the C code for native methods, you must match these signatures exactly. Listing 2. PGConnectionNative.c (includes PGConnection.h)
Now, let's (finally) write the C code.
The code for connectDB is very straightforward, and demonstrates the majority of the issues involved in writing native code. Notice that the first argument to connectDB is not listed in the PGConnection.java file. Java automatically passes a “handle” (a fancy pointer) to the object you are dealing with as the first parameter of every native method. In our case, this is a pointer to a struct HPGConnection (defined in PGConnection.h), which we name “this” (Listing 2, line 14. If you're working in C++, you may want to use “self” since “this” is a keyword). Any access to the object's data must go through this handle.
The remainder of the parameters are the Strings we passed in (see PGConnection.java). These are also passed as handles, or pointers to the struct Hjava_lang_String (defined in java_lang_string.h, included by native.h). We could access these structures like any other handles (see below), but Java provides several convenient functions that make it much easier to work with strings.
The most useful of these functions are makeCString and makeJavaString. These convert Java's Strings to char *s and vice versa, which use Java's garbage collector to handle memory allocation and recovery automatically. (
You must store the value returned by makeCString in a variable. If you pass the return value directly to a function, the garbage collector may free it at any time. The same is not true of makeJavaString.) Lines 30-34 in Listing 2 show the use of makeCString and we use makeJavaString first at line 51. Lines 41-42 in Listing 2 show our call into the libpq library. It is called exactly as normal, and the resulting pointer is stored in the variable tmpConn. You may notice that we don't do any error-checking here: we do that in the Java code for our constructor, where it is easier to throw exceptions.
As I mentioned above, PGConnection needs to keep the PGconn pointer around, so that it can use it in later calls—all later calls, in fact. In order to do this, we will store the 32 bit pointer in a data member with Java type int after casting it to a C long to avoid warnings (see Table 1 for a list of type conversions).
To access this member, we must use Java's “handles”. Handles are used to access data in a Java object. When you want to access a data member, you simply use unhand(ptr)->member rather than ptr->member (where ptr is the handle). We do this on line 42 of PGConnectionNative.c (Listing 2) to save the pointer returned by setDB in a Java int (note: if you forget the unhand() macro, you will get a warning about incompatible pointer types).
This function has covered almost all you need to know to call C functions from Java (calling Java methods from C is possible, but the interface is clumsy at best at this point, and where possible, I'd avoid it). Most of the rest of the methods (host, options, port, etc.) simply convert the data and make the C call. We'll just take a look at one of these, PGConnection.db().
The only significant portion of the C function PGConnection_db() is its first line (Listing 2, line 46). It needs a PGconn to pass to PQdb(), so it must get it out of the PGConnection member, PGconnRep. It uses cw[unhand() to get the pointer as a long, then casts that to a (PGconn *). Since this line is so messy (and is starting to look like lisp!) I created a macro, thisPGconn, to clean up the code a little. It is used in the remainder of the file, and its definition is at the top of the file (don't put it in PGConnection.h, since that is machine-generated).
All of the native methods in the Java class PGResult follow the same basic structure, and there is no reason to go over them.
There are some places where Java and C just don't get along. The rest of this section will touch on the few I found, and how I avoided them.
The exec() method (see, I told you I'd get to it) needs to return a PGResult object. This is in keeping with libpq's structure, and the OO nature of Java. However, returning an object from a native method can get pretty hairy. The “official” way to do it is to call the function:
HObject *execute_java_constructor(ExecEnv *, char *classname, ClassClass *cb, char *signature, ...);
and return the HObject * it returns. Personally, I find this interface extremely clumsy, and have managed to avoid it. However, for completeness, the actual call in our case would be:
return execute_java_constructor(EE(), "classPGResult", 0, "(I)LclassPGResult;", (long)tmpResult);
I found it far easier to create a buffer between the call to exec() and the call to PQexec() that could call the constructor from Java. This is where the nativeExec() method comes from. exec() simply passes the string to nativeExec(), which returns an int (the PGresult pointer that PQexec() returned). Then it calls PGResult's constructor with that int.
The extra layer will also come in handy when we add the asynchronous notification system.
PQgetline() expects the user to continually call it while it fills in a static buffer. This is simply not needed in Java. A much nicer interface is to just have getline() return a String. However, building the String (appending each return value from PQgetline()) required calling Java methods from C—which, as we saw in Hoop #1, is very messy. By using a StringBuffer (a String that can grow) and doing the work in the Java code, it's much easier to understand, if a little slower.
The flip side of this is that the return value is now the String, so there must be another way to tell if an error has occurred or an EOF has been reached. One solution (I'm looking for a better one), and the one we use, is to set a data member flag. If the flag has been set to EOF, we simply return a Java null String. So once again, an extra layer saves us from a lot of truly gross code!
This is one hoop I think the JavaSoft team should've solved for us. There is simply no way to get a FILE * (or a file descriptor) from a FileStream. PQtrace() expects a FILE *, so we simply open one, based on a filename passed in by the user. We check to see if it's “stdout” or “stderr”, and act accordingly.
We see the problem again when we try to implement Postgres95's printTuples (or displayTuples for 1.1). It also expects a FILE*, but this time the solution is a little messier. Here, we want the output in a String, so we open a temporary file, send it to the libpq function, rewind it, read it, and close it. This is pretty messy, but it does work, and is actually pretty quick about it. If we wanted to write a cleaner version, we could certainly rewrite displayTuples() completely in Java code, using PGResult's native methods fname() and getValue() that we have already defined.
After writing all the C code, we are ready to generate our shared library.
First, we have to compile the .c files:
gcc -O2 -fPIC -I/usr/local/java/include/ \ -I/usr/local/java/include/solaris \ -c PGConnectionNative.c gcc ... (repeat for each .c file)
Then we link them:
gcc -shared -Wl,-soname,libJgres.so.1 -o libJgres.so.1.0 *.o -lpq
The -lpq tells the dynamic loader to load libpq.so when Java loads this library.
And finally, put them somewhere the dynamic loader can find them (in your LD_LIBRARY_PATH, or in a standard location (i.e. /usr/local/lib) and rerun /sbin/ldconfig -v).
That's all there is to it. Now we can use PGConnection and PGResult just like any other Java classes.
To finish up this section, let's use our new classes to implement a simple SQL client. The client will connect to a database “foo” and accept query strings from standard input. PGConnection.exec() will process the queries, and print the results to the terminal using formatTuples(). The connection to the database is made on line 17 in Listing 3 (QueryTest.java).
We use the libpq convention of sending NULL (the empty Java string "" translates into a NULL char *) for any parameters we don't know. Notice that the call to PGConnection's constructor is surrounded by a “try” block. If an exception is thrown within this block, we have a problem with the connection and exit nicely (lines 54-58, Listing 3).
At line 24 of Listing 3, we test some of the simple functions to print out information about what we're connected to. We then read a query string and quit if it is “q” or “Q”.
We process the query on line 33 of Listing 3, by calling exec(). Note that we nest another “try” block here, because if we get a PostgresException on an exec(), we want to simply print the error and continue (we handle the exception on lines 43-46). If we reach line 34, we know that the PGResult is valid. We check to see if it returned any tuples, and use formatTuples() to print them if it did. If not, we simply print the current status and continue.