Book HomeJava and XML, 2nd EditionSearch this book

9.2. JAXP 1.0

It all begins (and began) with JAXP 1.0. This first version of Sun's API provided, basically, a thin layer over existing APIs that allowed for vendor-neutral parsing of code. For SAX, this isn't a huge deal; now that you are a SAX expert, you are smart enough to use the XMLReaderFactory class instead of directly instantiating a vendor's parser class. Of course, as you're also a DOM expert, you know that it's a pain to deal with DOM in a vendor-neutral way, so JAXP helps out quite a bit in this regard. Additionally, JAXP provided some methods for working with validation and namespaces, another vendor-specific task that can now be handled (in most cases) in a much better way.

9.2.1. Starting with SAX

Before getting into how JAXP works with SAX, I will fill you in on some SAX 1.0 details. Remember the org.xml.sax.helpers.DefaultHandler class I showed you in Chapter 4, "Advanced SAX " that implemented all the core SAX 2.0 handlers? There was a similar class in SAX 1.0 called org.xml.sax.HandlerBase ; this class implemented the SAX 1.0 handlers (which were slightly different in that version). As long as you understand this, you'll be all set to deal with JAXP 1.0.

To use JAXP with a SAX-compliant parser, your only task is to extend the HandlerBase class and implement the callbacks desired for your application. That's it, no different than doing the same for DefaultHandler in SAX 2.0. An instance of your extension class then becomes the core argument for most of the JAXP methods that deal with SAX.

Here's the typical SAX rundown:

The SAX component of JAXP provides a simple means to do all of this. Without JAXP, a SAX parser instance either must be instantiated directly from a vendor class (such as org.apache.xerces.parsers.SAXParser), or it must use a SAX helper class called ParserFactory (the SAX 1.0 version of SAX 2.0's XMLReaderFactory).

JAXP provides a better alternative. It allows you to use the vendor class as a parser through a Java system property. Of course, when you download a distribution from Sun, you get a JAXP implementation that uses Sun's parser by default. The same JAXP interfaces, but with an implementation built on Apache Xerces, can be downloaded from the Apache XML web site at http://xml.apache.org, and they use Apache Xerces by default. Therefore (in either case), changing the parser you are using requires that you change a classpath setting or system property value, but it does not require code recompilation. And this is the magic, the abstraction, that JAXP is all about.

WARNING: Where you download the JAXP classes from is important. Even though you can still set system properties to change the parser class, the default parser (when no system properties are present) depends on the implementation -- which depends on the location that JAXP comes from. The version from Apache XML uses Xerces by default, while Sun's version uses Crimson by default. If you get these mixed up, you may end up with the wrong parser in your classpath, and get ClassNotFound exceptions.

9.2.1.1. A look at the SAXParserFactory class

The JAXP SAXParserFactory class (in the javax.xml.parsers class, like all the JAXP classes) is the key to changing parser implementations easily. You must create a new instance of this class (which I will describe how to do in a moment). After the factory is created, it provides a method to obtain a SAX-capable parser. Behind the scenes, the JAXP implementation takes care of the vendor-dependent code, keeping your code unpolluted. This factory provides some other nice features, as well.

In addition to the basic job of creating instances of SAX parsers, the factory allows configuration options to be set. These options affect all parser instances obtained through the factory. The two options available in JAXP 1.0 are setting namespace awareness (setNamespaceAware (boolean awareness)), and turning on validation (setValidating (boolean validating)). Remember that after these options are set, they affect all instances obtained from the factory after the method invocation.

Once you have set up the factory, invoking the newSAXParser( ) method returns a ready-to-use instance of the JAXP SAXParser class. This class wraps an underlying SAX parser (an instance of the SAX class org.xml.sax.Parser). It also protects you from using any vendor-specific additions to the parser class. (Remember our earlier discussion about the xmlDocument class?) This class allows actual parsing behavior to be kicked off. Example 9-1 shows how a SAX factory can be created, configured, and used.

Example 9-1. Using the SAXParserFactory class

package javaxml2;

import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

// SAX
import org.xml.sax.AttributeList;
import org.xml.sax.HandlerBase;
import org.xml.sax.SAXException;

public class TestSAXParsing {

    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.err.println(
                    "Usage: java TestSAXParsing [XML Document filename]");
                System.exit(1);
            }

            // Get SAX Parser Factory
            SAXParserFactory factory = SAXParserFactory.newInstance( );

            // Turn on validation, and turn off namespaces
            factory.setValidating(true);
            factory.setNamespaceAware(false);

            SAXParser parser = factory.newSAXParser( );
            parser.parse(new File(args[0]), new MyHandler( ));

        } catch (ParserConfigurationException e) {
            System.out.println("The underlying parser does not " +
                               "support the requested features.");
        } catch (FactoryConfigurationError e) {
            System.out.println(
                "Error occurred obtaining SAX Parser Factory.");
        } catch (Exception e) {
            e.printStackTrace( );
        }
    }
}

class MyHandler extends HandlerBase {
    // SAX callback implementations from DocumentHandler, ErrorHandler, 
    //   DTDHandler, and EntityResolver
}

Notice in this code that two JAXP-specific problems can occur in using the factory: the inability to obtain or configure a SAX factory, and the inability to configure a SAX parser. The first of these problems, which is represented by a FactoryConfigurationError, usually occurs when the parser specified in a JAXP implementation or system property cannot be loaded. The second problem, ParserConfigurationException, occurs when a requested feature is not available in the parser being used. Both are easy to deal with and shouldn't pose as any difficulty.

A SAXParser is obtained once you get the factory, turn off namespaces, and turn on validation; then parsing begins. Notice that the parse( ) method of the SAX parser takes an instance of the SAX HandlerBase class that I mentioned earlier (I left the implementation of this class out of the code listing, but you can download the complete source file for TestSAXParsing.java at the book's web site). You also pass in the file (as a Java File) to parse, obviously. However, the SAXParser class contains much more than just this single method.

9.2.1.2. Working with the SAXParser class

Once you have an instance of the SAXParser class, you can do more with it than just passing it a File to parse. Because of the way components in large applications communicate these days, it is not always safe to assume that the creator of an object instance is its user. In other words, one component may create the SAXParser instance, while another component (perhaps coded by another developer) may need to use that same instance. For this reason, methods are provided to determine the settings of a parser instance. The two methods that provide this functionality are isValidating( ) , which informs the caller if the parser will perform validation, and isNamespaceAware( ) , which returns an indication if the parser can process namespaces in an XML document. While these methods can give you information about what the parser can do, you do not have the means to change these features. You must do this at the parser factory level.

Additionally, there is a variety of ways to request parsing of a document. Instead of just accepting a File and a SAX HandlerBase instance, the SAXParser's parse( ) method can also accept a SAX InputSource, a Java InputStream, or a URL in String form, all with a HandlerBase instance as the second argument. Different types of input documents can be treated to different means of parsing.

Finally, the underlying SAX parser (an instance of org.xml.sax.Parser) can be obtained and used directly through the SAXParser's getParser( ) method. Once this underlying instance is obtained, the usual SAX methods are available. Example 9-2 shows examples of the various uses of the SAXParser class, the core class in JAXP for SAX parsing.

Example 9-2. Using the JAXP SAXParser class

    // Get a SAX Parser instance
    SAXParser saxParser = saxFactory.newSAXParser( );

    // Find out if validation is supported
    boolean isValidating = saxParser.isValidating( );

    // Find out if namespaces is supported
    boolean isNamespaceAware = saxParser.isNamespaceAware( );

    // ------- Parse, in a variety of ways ----------------- //

    // Use a file and a SAX HandlerBase instance
    saxParser.parse(new File(args[0]), myHandlerBaseInstance);

    // Use a SAX InputSource and a SAX HandlerBase instance
    saxParser.parse(mySaxInputSource, myHandlerBaseInstance);

    // Use an InputStream and a SAX HandlerBase instance
    saxParser.parse(myInputStream, myHandlerBaseInstance);

    // Use a URI and a SAX HandlerBase instance
    saxParser.parse("http://www.newInstance.com/xml/doc.xml", 
                    myHandlerBaseInstance);

    // Get the underlying (wrapped) SAX parser
    org.xml.sax.Parser parser = saxParser.getParser( );    

    // Use the underlying parser
    parser.setContentHandler(myContentHandlerInstance);
    parser.setErrorHandler(myErrorHandlerInstance);
    parser.parse(new org.xml.sax.InputSource(args[0]));

Up to now, I've talked a lot about SAX, but I haven't unveiled anything remarkable or even that surprising. The fact is, the functionality of JAXP is fairly minor, particularly when SAX is involved. This is fine with me (and should be with you), because minimal functionality means your code is more portable and can be used by other developers, either freely (through open source) or commercially, with any SAX-compliant XML parser. That's it. There's nothing more to using SAX with JAXP. If you already know SAX, you're 98 percent of the way there. You just need to learn two new classes and a couple of Java exceptions, and you're ready to roll. If you've never used SAX, it's easy enough to start now.

9.2.2. Dealing with DOM

The process of using JAXP with DOM is nearly identical to using JAXP with SAX; all you do is change two class names and a method's return type, and you are pretty much there. If you understand how SAX works and understand what DOM is, you won't have any problem. Of course, you've got Chapter 5, "DOM" and Chapter 6, "Advanced DOM" to refer back to, so you're all set. Since JAXP doesn't have to fire SAX callbacks when working with DOM, it is responsible only for returning a DOM Document object from a parsing.

9.2.2.1. A look at the DOM parser factory

With a basic understanding of DOM and the differences between DOM and SAX, there is little else to say. The code in Example 9-3 will look remarkably similar to the SAX code in Example 9-1. First, an instance of DocumentBuilderFactory is obtained (in the same way that a SAXParserFactory instance was in SAX). Then the factory is configured to handle validation and namespaces (in the same way that it was in SAX). Next, a DocumentBuilder, the DOM analog to SAXParser, is retrieved from the factory. Parsing can then occur, and the resulting DOM Document object is handed off to an instance of the DOMSerializer class (from Chapter 5, "DOM").

Example 9-3. Using the DocumentBuilderFactory class

package javaxml2;

import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

// DOM
import org.w3c.dom.Document;
import org.w3c.dom.DocumentType;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class TestDOMParsing {

    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.err.println (
                    "Usage: java TestDOMParsing [filename]");
                System.exit(1);
            }

            // Get Document Builder Factory
            DocumentBuilderFactory factory = 
                DocumentBuilderFactory.newInstance( );

            // Turn on validation, and turn off namespaces
            factory.setValidating(true);
            factory.setNamespaceAware(false);

            DocumentBuilder builder = factory.newDocumentBuilder( );
            Document doc = builder.parse(new File(args[0]));

            // Serialize the DOM tree
            DOMSerializer serializer = new DOMSerializer( );
            serializer.serialize(doc, System.out);

        } catch (ParserConfigurationException e) {
            System.out.println("The underlying parser does not " +
                "support the requested features.");
        } catch (FactoryConfigurationError e) {
            System.out.println("Error occurred obtaining Document " +
                "Builder Factory.");
        } catch (Exception e) {
            e.printStackTrace( );
        }
    }
}

Two problems can arise from this code: a FactoryConfigurationError and a ParserConfigurationException. The cause of each is the same as it was in SAX. Either there's a problem present in the implementation classes (FactoryCon-figurationError), or the parser provided doesn't support the requested features (ParserConfigurationException). The only difference between DOM and SAX is that with DOM, you substitute DocumentBuilderFactory for SAXParserFactory, and DocumentBuilder for SAXParser.

9.2.2.2. Working with the DOM parser

Once you have a DOM factory, you can obtain a DocumentBuilder instance from it. The methods available to a DocumentBuilder instance are very similar to those available to its SAX counterpart. The major difference is that variations of the parse( ) method do not take an instance of the SAX HandlerBase class. Instead they return a DOM Document instance representing the XML document that was parsed. The only other difference is that two methods are provided for SAX-like functionality: setErrorHandler( ), which takes a SAX ErrorHandler implementation to handle problems that may arise in parsing, and setEntityResolver( ), which takes a SAX EntityResolver implementation to handle entity resolution. Example 9-4 shows examples of these methods in action.

Example 9-4. Using the JAXP DocumentBuilder

    // Get a DocumentBuilder instance
    DocumentBuilder builder = builderFactory.newDocumentBuilder( );

    // Find out if validation is supported
    boolean isValidating = builder.isValidating( );

    // Find out if namespaces is supported
    boolean isNamespaceAware = builder.isNamespaceAware( );

    // Set a SAX ErrorHandler
    builder.setErrorHandler(myErrorHandlerImpl);

    // Set a SAX EntityResolver
    builder.setEntityResolver(myEntityResolverImpl);

    // ------------ Parse, in a variety of ways ------------------- //

    // Use a file
    Document doc = builder.parse(new File(args[0]));

    // Use a SAX InputSource
    Document doc = builder.parse(mySaxInputSource);

    // Use an InputStream
    Document doc = builder.parse(myInputStream, myHandlerBaseInstance);

    // Use a URI 
    Document doc = builder.parse("http://www.newInstance.com/xml/doc.xml");

It really is that straightforward to take what you've learned about SAX and apply it to DOM. So make your bar bets with friends and coworkers on how using JAXP is a piece of cake; you'll win every time.

9.2.3. Changing the Parser

The last topic I need to address in dealing with JAXP is the ability to easily change out the parser used by the factory classes. Changing the parser used by JAXP actually means changing the parser factory, because all SAXParser and DocumentBuilder instances come from these factories. Since the factories determine which parser is loaded, it's the factories that must change. The implementation of SAXParserFactory to be used can be changed by setting the Java system property javax.xml.parsers.SAXParserFactory. If this property isn't defined, then the default implementation (whatever parser your vendor specified) is returned. The same principle applies for the DocumentBuilderFactory implementation you use. In this case, the javax.xml.parsers.DocumentBuilderFactory system property is queried. And as simple as that, we have gone through it all! This is the whole scope of JAXP 1.0: provide hooks into SAX, provide hooks into DOM, and allow the parser to easily be changed out.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.