Book HomeXML in a NutshellSearch this book

20.4. Constraints

In addition to defining the basic structures used in documents and DTDs, XML 1.0 defines a list of rules regarding their usage. These constraints put limits on various aspects of XML usage, and documents cannot in fact be considered to be "XML" unless they meet all of the well-formedness constraints. Parsers are required to report violations of these constraints, though only well-formedness constraint violations require that processing of the document halt completely. Namespace constraints are defined in Namespaces in XML, not XML 1.0.

20.4.1. Well-Formedness Constraints

Well-formedness refers to an XML document's physical organization. Certain lexical rules must be obeyed before an XML parser can consider a document well-formed. These rules should not be confused with validity constraints, which determine whether a particular document is valid when parsed using the document structure rules contained in its DTD. The Backus-Naur Form (BNF) grammar rules must also be satisfied. The following sections contain all well-formedness constraints recognized by XML Version 1.0 parsers, including actual text from the 1.0 specification.

PEs in Internal Subset

Text from specification

In the internal DTD subset, parameter entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

Explanation

It is only legal to use parameter entity references to build markup declarations within the external DTD subset. In other words, within the internal subset, parameter entities may only be used to include complete markup declarations.

External Subset

Text from specification

The external subset, if any, must match production for extSubset.

Explanation

The extSubset production constrains what type of declaration may be contained in the external subset. This constraint generally means that the external subset of the DTD must only include whole declarations or parameter entity references. See the extSubset production in the EBNF grammar at the end of this chapter for specific limitations.

PE Between Declarations

Text from specification

The replacement text of a parameter entity reference in a DeclSep must match the production extSubsetDecl.

Explanation

The replacement text of parameter entities may contain declarations that might not be allowed if the replacement text appeared directly. Parameter entity references in the internal subset cannot appear within declarations, but this rule does not apply to declarations that have been included via parameter entities.

Element Type Match

Text from specification

The Name in an element's end-tag must match the element type in the start-tag.

Explanation

Proper element nesting is strictly enforced, and every open tag must be matched by a corresponding close tag. Of course empty elements do not require and may not have a close tag.

Unique Att Spec

Text from specification

No attribute name may appear more than once in the same start-tag or empty- element tag.

Explanation

Attribute names must be unique within a given element.

No External Entity References

Text from specification

Attribute values cannot contain direct or indirect entity references to external entities.

Explanation

XML parsers report an error when asked to replace references to external parsed entities within attribute values.

No < in Attribute Values

Text from specification

The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

Explanation

This restriction is meant to simplify the task of parsing XML data. Since attribute values can't even appear to contain element data, simple parsers need not track literal strings. Just by recognizing < and > characters, simple parsers can check for proper markup formation and nesting.

Legal Character

Text from specification

Characters referred to using character references must match the production for Char.

Explanation

Any characters that the XML parser generates must be real characters. A few character values in Unicode are not valid standalone characters.

Entity Declared

Text from specification

In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with standalone='yes', for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration. Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

Explanation

This long constraint lists the only situations in which an entity reference may appear without a corresponding entity declaration. Since a nonvalidating parser is not obliged to read and parse the external subset, the parser must give the document the benefit of the doubt, if an entity could possibly have been declared.

Parsed Entity

Text from specification

An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

Explanation

Since unparsed entities can't be parsed, don't try to force the parser to parse them.

No Recursion

Text from specification

A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

Explanation

Be careful how you structure your entities; make sure you don't inadvertently create a circular reference:

<!ENTITY a "&b;">
<!ENTITY b "&c;">
<!ENTITY c "&a;"> <!--wrong!-->
In DTD

Text from specification

Parameter entity references may only appear in the DTD.

Explanation

This constraint is self evident because the % character has no significance outside of the DTD. Therefore, it is perfectly legal to have an element like this in your document:

<ok>%noproblem;</ok>

The text %noproblem; is passed on by the parser without generating an error.

20.4.2. Validity Constraints

The following sections contain all validity constraints that are enforced by a validating parser. Each includes actual text from the XML 1.0 specification and a short explanation of what the constraint actually means.

Root Element Type

Text from specification

The Name in the document type declaration must match the element type of the root element.

Explanation

The name provided in the !DOCTYPE declaration identifies the root element's name and must match the name of the root element in the document.

Proper Declaration/PE Nesting

Text from specification

Parameter entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration is contained in the replacement text for a parameter entity reference, both must be contained in the same replacement text.

Explanation

This constraint means you can't create a parameter entity that completes one DTD declaration and begins another; the following XML fragment would violate this constraint:

<!ENTITY % finish_it ">">
<!ENTITY % bad "won't work" %finish_it; <!--wrong!-->
Standalone Document Declaration

Text from specification

The standalone document declaration must have the value "no" if any external markup declarations contain declarations of: attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document, or attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or element types with element content, if whitespace occurs directly within any instance of those types.

Explanation

This laundry list of potential standalone flag violations can be read to mean, "If you have an external subset in your DTD, ensure that your document doesn't depend on anything in it if you say standalone='yes' in your XML declaration." A more succinct interpretation would be, "If your document has an external DTD subset, just set standalone to no."

Element Valid

Text from specification

An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds: The declaration matches EMPTY and the element has no content. The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional whitespace (characters matching the nonterminal S) between the start-tag and the first child element, between child elements, or between the last child element and the end-tag. Note that a CDATA section containing only whitespace does not match the nonterminal S, and hence cannot appear in these positions. The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model. The declaration matches ANY, and the types of any child elements have been declared.

Explanation

If a document includes a DTD with element declarations, make sure the actual elements in the document match the rules set down in the DTD.

Attribute Value Type

Text from specification

The attribute must have been declared; the value must be of the type declared for it.

Explanation

All attributes used on elements in valid XML documents must have been declared in the DTD, including the xml:space and xml:lang attributes. If you declare an attribute for an element, make sure that every instance of that attribute has a value conforming to the type specified. (For attribute types, see Attribute List Declaration.)

Unique Element Type Declaration

Text from specification

No element type may be declared more than once.

Explanation

Unlike entity and attribute declarations, only one declaration may exist for a particular element type.

Proper Group/PE Nesting

Text from specification

Parameter entity replacement text must be properly nested with parenthesized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

For interoperability, if a parameter entity reference appears in a choice, seq, or Mixed construct, its replacement text should contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

Explanation

This constraint restricts the way parameter entities can be used to construct element declarations. It is similar to the Proper Declaration/PE Nesting constraint in that parameter entities may not be used to complete or open new parenthesized expressions. It prevents the XML author from hiding significant syntax elements inside parameter entities.

No Duplicate Types

Text from specification

The same name must not appear more than once in a single mixed-content declaration.

Explanation

Don't list the same element type name more than once in the same mixed-content declaration.

ID

Text from specification

Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

Explanation

No two attribute values for attributes declared as type ID can have the same value. This constraint is not restricted by element type, but it is global across the entire document.

One ID per Element Type

Text from specification

No element type may have more than one ID attribute specified.

Explanation

Each element can have at most one ID type attribute.

ID Attribute Default

Text from specification

An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

Explanation

To avoid potential duplication, you can't declare an ID attribute to be #FIXED or provide a default value for it.

IDREF

Text from specification

Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e., IDREF values must match the value of some ID attribute.

Explanation

ID references must refer to actual ID attributes that exist within the document.

Entity Name

Text from specification

Values of type ENTITY must match the Name production, and values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

Explanation

Attributes declared to contain entity references must contain references to unparsed entities declared in the DTD.

Name Token

Text from specification

Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

Explanation

If an attribute is declared to contain a name or list of names, the values must be legal XML name tokens.

Notation Attributes

Text from specification

Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

Explanation

Attributes that must contain notation names must contain names that reference notations declared in the DTD.

One Notation per Element Type

Text from specification

No element type may have more than one NOTATION attribute specified.

Explanation

A given element can have only one attribute declared with the NOTATION attribute type. This constraint is provided for backward compatibility with SGML.

No Notation on Empty Element

Text from specification

For compatibility, an attribute of type NOTATION must not be declared on an element declared EMPTY.

Explanation

Empty elements cannot have NOTATION attributes in order to maintain compatibility with SGML.

Enumeration

Text from specification

Values of this type must match one of the Nmtoken tokens in the declaration.

Explanation

Assigning a value to an enumerated type attribute that isn't listed in the enumeration is illegal in the DTD.

Required Attribute

Text from specification

If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

Explanation

Required attributes must appear in the document and have a value assigned to them if they are declared as #REQUIRED in the DTD.

Attribute Default Legal

Text from specification

The declared default value must meet the lexical constraints of the declared attribute type.

Explanation

If you provide a default attribute value, it must obey the same rules that apply to a normal attribute value within the document.

Fixed Attribute Default

Text from specification

If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

Explanation

If you choose to provide an explicit value for a #FIXED attribute in your document, it must match the default value given in the attribute declaration.

Proper Conditional Section/PE Nesting

Text from specification

If any of the "<![", "[", or "]]>" of a conditional section is contained in the replacement text for a parameter entity reference, all of them must be contained in the same replacement text.

Explanation

If you use a parameter entity to contain the beginning of a conditional section, the parameter entity must also contain the end of the section.

Entity Declared

Text from specification

In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities amp, lt, gt, apos, quot, in the form specified in 4.6 Predefined Entities. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any attribute-list declaration containing a default value with a direct or indirect reference to that general entity.

Explanation

Parameter and general entity declarations must precede any references to these entities. All entity references must refer to previously declared entities. The specification also states that declaring the five predefined general entities (amp, lt, gt, apos, and quot) is a good idea. In reality, declaring the predefined general entities adds unnecessary complexity to most applications.

Notation Declared

Text from specification

The Name must match the declared name of a notation.

Explanation

External unparsed entities must use a notation that is declared in the document.

Unique Notation Name

Text from specification

Only one notation declaration can declare a given Name.

Explanation

Declaring two notations with the same name is illegal.

20.4.3. Namespace Constraints

The following list contains all constraints defined by the namespaces specification. Each includes actual text from the Namespaces in XML specification and a short explanation of what the constraint actually means.

Leading "XML"

Text from specification

Prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved for use by XML and XML-related specifications.

Explanation

Just like most other names in XML, namespace prefixes names can't begin with xml unless they've been defined by the W3C.

Prefix Declared

Text from specification

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e., an element in whose content the prefixed markup occurs). The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. The prefix xmlns is used only for namespace bindings and is not itself bound to any namespace name.

Explanation

You have to declare all namespaces before you can use them. The prefixes have no meaning without the declarations, so using a prefix without a declaration context is an error. The namespace with the prefix xml is permanently defined, so there is no need to redeclare it. The xmlns prefix used by namespace declarations is not considered a namespace prefix itself, and no declaration is needed for it.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.