Configuring XMLReader Behavior (SAX2)

3.3.1. XMLReader Properties

SAX2 defines two XMLReader calls for accessing named property objects. One of the most common uses for such objects is to install non-core event handlers. Accessing properties is like accessing feature flags, except that the values associated with these names are objects rather than Booleans:

XMLReader       producer ...;
String          uri = ...;
Object          value = ...;

// Try getting and setting the property
try {
    System.out.println ("Initial property setting: "
	+ producer.getProperty (uri);
    // if we get here, the property is supported

    producer.setProperty (uri, value);
    // if we get here, the parser set the property 

} catch (SAXNotSupportedException e) {
    // bad value for property ... maybe wrong type, or parser state
    System.out.println ("Can't set property: "
	+ e.getMessage ());
    System.exit (1);

} catch (SAXNotRecognizedException e) {
    // property not supported by this parser
    System.out.println ("Doesn't understand property: "
	+ e.getMessage ());
    System.exit (1);
}

You'll notice the URIs for these standard properties happen to have a common prefix. This means that you can declare the prefix (http://xml.org/sax/properties/) as a constant string and construct the identifiers by string catenation.

Here are the standard properties:

http://xml.org/sax/properties/declaration-handler

Applications may find it useful to define their own types of handler interfaces, assembling sequences of SAX event "atoms" into higher-level event "molecules" that incorporate essential application-level semantics (and probably some procedural validation). This is the same kind of process model used by W3C's XML schema processing model: the Post-Schema-Validation Infoset (PSVI) additions incorporate semantics suited to processing with that kind of schema. Most applications need to associate even more semantics with data than are easily captured by such simple rules (including DTDs and all types of schema). Those semantics would likely not be understood by any common XMLReader, but other kinds of SAX processing components can help manage such application-level handlers. You can see an example of this technique in Example 6-3.

3.3.2. XMLReader Feature Flags

The previous chapter showed how to access feature flags from SAX parsers and used the standard validation flag as the primary example. Accessing feature flags follows the same model as accessing properties, except the values are boolean not Object. There are a handful of standard SAX2 feature flags, which are all you normally need. The namespace for features is different from the namespace for properties. You can't set a property to a java.lang.Boolean value and expect to have the same effect as setting the feature flag that happens to use the same identifier.

As with properties, the URIs for these standard feature flags happen to have a common prefix: http://xml.org/sax/features/. It's good programming practice to declare the prefix as a constant and construct these feature identifiers by string catenation, helping reduce errors. Also, remember that flags aren't necessarily either settable (read/write)[17] or readable (supported); some parsers won't recognize all these flags, and in some cases these flags expose parser behaviors that don't change.

[17]SAX could support write-only flags too, but these are rarely a good idea.

The standard flags are as follows:

http://xml.org/sax/features/external-general-entities

A few additional standard extension features will likely be defined, providing even more complete Infoset support from SAX2 XML parsers.Ælfred also includes a nonvalidating parser, which supports only false for this flag.

Of the widely available parsers, only Xerces has nonstandard feature flags. (The Xerces distribution includes full documentation for those flags.) As a rule, avoid most of these, because they are parser-specific and even version-specific. Some are used to disable warnings about extra definitions that aren't errors. (Most parsers don't bother reporting such nonerrors; Xerces reports them by default.) Others promote noncompliant XML validation semantics. Here are a few flags that you may want to use.

http://apache.org/xml/features/validation/schema

This tells the parser to validate with W3C-style schemas. The document needs to identify a schema, and the parser must have namespaces and validation enabled. (Defaults to false.)

W3C XML schema validation does not need to be built into XML parsers. In fact, most currently available schema validators are layered.

http://apache.org/xml/features/validation/schema-full-checking

This flag controls whether W3C schema validation involves all the specified tests. By default, some of the more expensive checks are not performed; Xerces is not "fully conforming" by default.

http://apache.org/xml/features/allow-java-encodings

This flag defaults to false, limiting the encodings that the parser accepts to a handful. When the flag is set to true, more encoding names are supported. Most other SAX2 parsers effectively have true as their default. A few of those additional encoding names are Java-specific (such as "UTF8"); most of them are standard encoding names, either the preferred version or recognized alternatives.

http://apache.org/xml/features/continue-after-fatal-error

When set, this flag permits Xerces to continue parsing after it invokes ErrorHandler.fatalError() to report a nonrecoverable error. If the error handler doesn't abort parsing by throwing an exception, Xerces will continue. The XML specification requires that no more event data be reported after fatal errors, but it allows additional errors to be reported. (Of course, depending on the initial error, many of the subsequent reports might be nonsense.)


3.2. Bootstrapping an XMLReader		3.4. The EntityResolver Interface

3.3. Configuring XMLReader Behavior

3.3.1. XMLReader Properties

3.3.2. XMLReader Feature Flags