Book HomeXML SchemaSearch this book

11.2. Defining Element Types

The next hint we can give to a schema processor helps it to determine the simple or complex type of an element. A schema validator usually guesses the type of an element from its name and the description of the content model of its parent. This guess can be overridden by the author of an instance document through the xsi:type attribute, as long as this new type is a derivation by restriction or extension of the type defined in the schema document. Since this type is defined using an attribute in the instance document, the definition is possible only for elements. (Attributes can't have attributes!)

At this point, the question is "why would we want to define a type in the instance document?". The answer is somewhat different for simple and complex types, as well as whether we are interested in a schema for validation purposes only or for data binding.

11.2.1. Defining Simple Types

An element (or attribute) can belong to several different simple types and a derivation by union is generally a good way to let a schema validator pick the right type without having to use xsi:type. We can go quite far in this direction. To illustrate this, now that we've seen both the principles of the derivation by union and the patterns, let's define a union type that can accept ISO 8601 dates, common English ("April 2nd, 1998"), and French ("2 avril 1998") formats. We can start by defining a ISO date without a time zone as discussed in Chapter 4, "Using Predefined Simple Datatypes":

<xs:simpleType name="dateISO">
  <xs:restriction base="xs:date">
    <xs:pattern value="[^:Z]*"/>
  </xs:restriction>
</xs:simpleType>

The English format can be described using different patterns for the months that have 31, 30, and 28 days (we do not cover leap years in this example). The following definition should give a fairly good approximation for years after AD with a maximum of four digits (the lines are split for readability but the patterns are on a single line):

<xs:simpleType name="EnglishDate">
  <xs:restriction base="xs:token"> 
    <xs:pattern
      value="(January|March|May|July|August|October|December)
      ([1-3]?1st|[12]?2nd|[12]?3rd|(30|[12]?[4-9])th),[0-9]{0,4}"/> 
    <xs:pattern value="February
      ([1-2]?1st|[12]?2nd|[12]?3rd|[12]?[4-9]th),[0-9]{0,4}"/> 
    <xs:pattern value="(April|June|September|November)
      ([1-2]?1st|[12]?2nd|[12]?3rd|(30|[12]?[4-9])th),[0-9]{0,4}"/>
  </xs:restriction>
</xs:simpleType>

After the English format, the French one looks simple! The same principle can be applied (line breaks have been added to the patterns for readability):

<xs:simpleType name="dateFrançaise">
  <xs:restrictionbase="xs:token">
    <xs:pattern value="(ler|[1-3][01]|[12]?[2-9])
        (janvier|mars|mai|juillet|aout|octobre|décembre)\d{0,4}"/>
    <xs:pattern value="(ler|[12][01]|[12]?[2-9]) février \d{0,4}"/>
    <xs:pattern value="(ler|[12][01]|[12]?[2-9]|30)
        (avril|juin|septembre|novembre)\d{0,4}"/>
  <xs:restriction>
</xs:simpleType>

The last step is to derive our type by union as follows:

<xs:simpleType name="anydate">
  <xs:union memberTypes="dateISO EnglishDate dateFrançaise"/>
</xs:simpleType>

We now have a simple type that will accept three different date formats. A schema processor should not only validate these three formats, but it should also mention which type it has recognized in the PSVI. We've achieved this without adding anything in the instance document. Why do we want to give the information in the instance document, then? There are a couple of reasons for this. The first is that we want to convey the information to an application that is not able to get it from the PSVI. This is often the case with current tools, since there is no specification of the interface that will allow an application to read the PSVI. This reason does not apply if we are interested only in validation, but may be important if we want to avoid making applications that manipulate our instance documents check which format they get.

The second reason isn't shown in the previous example. Because the lexical spaces of the different member types have no overlap, there is no confusion possible. This is not always the case. We may want to override the choice made by the schema validator, or even to use a generic "universal" type in the schema and rely on the instance documents to define which type is used. One type of application that is a good prospect for this scenario is protocol or binding applications for which XML is only a transient serialization format. These applications often need to define generic elements that can be used to hold parameters of any type.

For instance, a schema-based XML-RPC can be defined by the first example of the XML-RPC specification:

<methodCall>
  <methodName>
    examples.getStateName
  </methodName>
  <params>
    <param>
      <value>
        <i4>
          41
        </i4>
      </value>
    </param>
  </params>
</methodCall>

In an imaginary W3C XML Schema-aware version of XML-RPC, this could be replaced by:

<methodCall xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <methodName>
    examples.getStateName
  </methodName>
  <params>
    <param>
      <value xsi:type="xs:int">
        41
      </value>
    </param>
  </params>
</methodCall>
TIP: Without imposing such usage, SOAP allows this practice in the case of " Polymorphic Accessor." The W3C Working Draft of 2 October 2001 mentions this:

Many languages allow accessors that can polymorphically access values of several types, each type being available at run time. A polymorphic accessor instance MUST contain an xsi:type attribute that describes the type of the actual value. For example, a polymorphic accessor named "cost" with a value of type xs:float would be encoded as follows:

<cost xsi:type="xs:float"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  29.95
</cost>

11.2.2. Defining Complex Types

Although the mechanism to forcibly identify a complex type in an instance document is similar to the one we saw for simple types, the motivations for using it can be completely different. If it is still possible to use this feature in case of a polymorphic accessor, to use the terminology taken by the W3C Protocols WG, this is probably a relatively marginal use case for complex types that do not have extension by union. The equivalent to the simple type derivation by union for complex types would be the ability to define several content models for a same element and to let the schema processor try all these content models and keep the first one that matches the fragment of the instance document. This would indeed be a nice feature, but this is exactly what the Consistent Declaration and Unique Particle Attribution rules explicitly forbid. Therefore, xsi:type for complex types has no competition in the schema itself and is often used as a workaround against these rules.

Another way to understand this is to consider this feature a hint given to the schema processor that will allow it to disambiguate the choice it could have and avoid violating one of these rules. A typical use is to work around the Unique Particle Attribution rule to allow two different content models for the same element. We have seen in Chapter 9, "Defining Uniqueness, Keys, and Key References" that xs:key might be used to allow our title to be expressed either as an attribute or as an element, but this workaround doesn't help if we want to allow more complex combinations, such as either a title expressed as an attribute or one or more titles expressed as elements:

<book id="b0836217462" available="true" title="Being a Dog Is a
  Full-TimeJob">
  .../...
</book>

or:

<book id="b0836217462" available="true" type="bookTitleElements">
  <isbn>
    0836217462
  </isbn>
  <title lang="en">
    Being a Dog Is a Full-Time Job
  </title>
  <title lang="fr">
    Etre un chien est un travail à plein temps.
  </title>
  .../...
</book>

To do so, we will define a base type that is a superset of both content models:

<xs:complexType name="bookBase">
  <xs:sequence>
    <xs:element ref="isbn"/>
    <xs:element ref="title" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
  <xs:attribute ref="title"/>
  <xs:attribute ref="available"/>
</xs:complexType>

This base type accepts book elements with optional titles defined as attributes or elements. We can derive by restriction a first type which will accept only title attributes:

<xs:complexType name="bookTitleAttribute">
  <xs:complexContent>
    <xs:restriction base="bookBase">
      <xs:sequence>
        <xs:element ref="isbn"/> 
        <xs:element ref="author" minOccurs="0"
          maxOccurs="unbounded"/> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>

We can derive a second type that accepts only titles defined as one or more title elements:

<xs:complexType name="bookTitleElements">
  <xs:complexContent>
    <xs:restriction base="bookBase">
      <xs:sequence>
        <xs:element ref="isbn"/>
        <xs:element ref="title" minOccurs="1" maxOccurs="unbounded"/> 
        <xs:element ref="author" minOccurs="0"
          maxOccurs="unbounded"/> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute ref="title" use="prohibited"/>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>

Now that we have all our building blocks, we can use them in the schema to define the book element as having a type bookBase:

<xs:element name="book" type="bookBase"/>

Then we can use them in the instance documents to declare which derived type we are using:

<book id="b0836217462" available="true" xsi:type="bookTitleElements">
  <isbn>
    0836217462
  </isbn>
  <title lang="en">
    Being a Dog Is a Full-Time Job
  </title>
  <title lang="fr">
    Etre un chien est un travail à plein temps.
  </title>
  .../...
</book>

or:

<book id="b0836217462" available="true" title="Being a Dog Is a
  Full-TimeJob" xsi:type="bookTitleAttribute">
  .../...
</book>

However, this allows instance documents to use the base type, which may not be something we want, since we can have either no title at all or an attribute and one or more elements (something we want to avoid). We can forbid the use of the base type by defining it as "abstract." Setting this attribute of the complex type definition blocks instance documents from using it. They will have to specify one of its derived types through a xsi:type attribute.

The feature of the abstract attribute is symmetrical to the block attribute we have already seen. While the block attribute was prevented from further derivation, abstract requires a derivation. The final definition of our base complex type is then:

<xs:complexType name="bookBase" abstract="true">
  <xs:sequence>
    <xs:element ref="isbn"/>
    <xs:element ref="title" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
  <xs:attribute ref="title"/>
  <xs:attribute ref="available"/>
</xs:complexType>


Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.