Book HomeXML SchemaSearch this book

5.2. Derivation By List

Derivation by list is the mechanism by which a list datatype can be derived from an atomic datatype. All the items in the list need to have the same datatype.

5.2.1. List Datatypes

List datatypes are special cases in which a structure is defined within the content of a single attribute or element. This practice is usually discouraged since applications do not have access to the atomic values through the current XML APIs, XPath expressions, or in the Infoset. This situation might change in the future since these datatypes should be adopted by XPath 2.0, which will likely provide some kind of mechanism to access to the items within these lists.

This feature appears to have been introduced to maintain compatibility with SGML and XML DTD IDREFS, but W3C XML Schema has been cautious and doesn't allow definition of the list separator or complex lists with complex types or heterogeneous members. Among the constructs that can be seen in some XML vocabularies and cannot be described by XML Schema (except by using regular expressions as a partial workaround) are comma-separated lists of values, and lists with heterogeneous members, such as values with units:

<commaSeparated>
  1, 2, 25
</commaSeparated>
  
<valueWithUnit>
  10 em
</valueWithUnit>

Whitespace-separated lists and split XML elements or attributes are preferred:

<commaSeparated>
  1 2 25
</commaSeparated>
  
<valueWithUnit unit="em">
  10
</valueWithUnit>
              
<valueWithUnit>
  10em
</valueWithUnit>

IDREFS, ENTITIES, and NMTOKENS are predefined list datatypes that are derived from atomic types using this method.

As we have seen with these three datatypes, all the list datatypes that can be defined must be whitespace-separated. No other separator is accepted.

With this restriction, defining a list is very simple, and W3C XML Schema has defined two syntaxes. Both use a xs:list element, which allows a definition by reference to existing types or embeds a type definition (these two syntaxes cannot be mixed).

The definition of a list datatype by reference to an existing type is done through a itemType attribute:

<xs:simpleType name="integerList">
  <xs:list itemType="xs:integer"/>
</xs:simpleType>

This datatype can be used to define attributes or elements that accept a whitespace-separated list of integers such as: "1 -25000 1000."

The definition of a list datatype can also be done by embedding a xs:simpleType(global definition) element:

<xs:simpleType name="myIntegerList">
  <xs:list>
    <xs:simpleType>
      <xs:restriction base="xs:integer">
        <xs:maxInclusive value="100"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:list>
</xs:simpleType>

This datatype can be used to define attributes or elements that accept a whitespace-separated list of integers smaller than or equal to 100 such as: "1 -25000 100."

List datatypes have their own value space that can be constrained using a set of specific facets that is common to all of them.

These facets are xs:length, xs:maxLength, xs:minLength, xs:enumeration and xs:whiteSpace. The unit used to measure the length of a list type is always the number of elements in the list.

To apply these facets to a user-defined list type, we need to follow two steps. We first define the list datatype, and then define a datatype to constrain the list datatype. The reason for this is each xs:simpleType(global definition) accepts only one derivation method chosen between the three existing methods.

In this process, the derivation by restriction has to be done first, since a list datatype loses the facets of its atomic type and has the only five facets just described that have a meaning that is specific to list types.

Defining Atomic Datatypes That Allow Whitespace

It is possible to define lists of atomic datatypes that allow whitespaces such as xs:string. In this case, whitespaces are always considered separators.

The impact of this statement can be seen when we apply a facet constraining the length of such a datatype:

<xs:simpleType name="myStringList">
  <xs:list itemType="xs:string"/>
</xs:simpleType>
  
<xs:simpleType name="myRestrictedStringList">
  <xs:restriction base="myStringList">
    <xs:maxLength value="10"/>
  </xs:restriction>
</xs:simpleType>

The datatype myRestrictedStringList is a list of a maximum of 10 items. Since these items are separated by whitespaces, myRestrictedStringList is a list of a maximum of 10 portions of strings that do not contain whitespace (i.e., 10 "words").

This datatype, therefore, validates a value such as:

<myRestrictedStringList>
  This value has less than ten words.
</myRestrictedStringList>

But not this one:

<myRestrictedStringList> 
  This value has more than ten words... even if they could be
  spreading less than ten "strings."
</myRestrictedStringList>

Defining lists of lists is forbidden per the W3C XML Schema Recommendation.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.