Book HomeXSLTSearch this book

Chapter 5. Creating Links and Cross-References

Contents:

Generating Links with the id() Function
Generating Links with the key() Function
Generating Links in Unstructured Documents
Summary

If you're creating a web site, publishing a book, or creating an XML transaction, chances are many pieces of information will refer to other things. This chapter discusses a several ways to link XML elements. It reviews three techniques:

5.1. Generating Links with the id() Function

Our first attempt at linking will be with the XPath id() function.

5.1.1. The ID, IDREF, and IDREFs Datatypes

Three of the basic datatypes supported by XML Document Type Definitions (DTDs) are ID, IDREF, and IDREFS. Here's a simple DTD that illustrates these datatypes:

<!--glossary.dtd-->
<!--The containing tag for the entire glossary-->
<!ELEMENT glossary  (glentry+) >

<!--A glossary entry-->
<!ELEMENT glentry  (term,defn+) >

<!--The word being defined-->
<!ELEMENT term  (#PCDATA) >

<!--The id is used for cross-referencing, and the 
    xreftext is the text used by cross-references.-->
<!ATTLIST term
               id  ID    #REQUIRED 
               xreftext  CDATA    #IMPLIED  >

<!--The definition of the term-->
<!ELEMENT defn  (#PCDATA | xref | seealso)* >

<!--A cross-reference to another term-->
<!ELEMENT xref   EMPTY  >

<!--refid is the ID of the referenced term-->
<!ATTLIST xref
               refid  IDREF    #REQUIRED >

<!--seealso refers to one or more other definitions-->
<!ELEMENT seealso EMPTY>
<!ATTLIST seealso
                  refids   IDREFS  #REQUIRED >

In this DTD, each <term> element is required to have an id attribute, and each <xref> element must have an refid attribute. The ID and IDREF datatypes work according to two rules:

To round out our example, the <seealso> element contains an attribute of type IDREFS. This datatype contains one or more values, each of which must match a value of an ID elsewhere in the document. Multiple values, if present, are separated by whitespace.

There are some complications of ID and related datatypes, but we'll discuss them later. For now, we'll focus on how the id() function works.

5.1.2. An XML Document in Need of Links

To illustrate the value of linking, we'll use a small glossary written in XML. The glossary contains some <glentry> elements, each of which contains a single <term> and one or more <defn> elements. In addition, a definition is allowed to contain a cross-reference (<xref>) to another <term>. Here's a short sample document:

<?xml version="1.0" ?>
<!DOCTYPE glossary SYSTEM "glossary.dtd">
<glossary>
  <glentry>
    <term id="applet">applet</term>
    <defn>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <xref refid="servlet"/>.
      <seealso refids="wildcard-char DMZlong pattern-matching"/>
    </defn>
  </glentry>

  <glentry>
    <term id="DMZlong" xreftext="demilitarized zone">demilitarized 
      zone (DMZ)</term>
    <defn>
      In network security, a network that is isolated from, and 
      serves as a neutral zone between, a trusted network (for example, 
      a private intranet) and an untrusted network (for example, the
      Internet). One or more secure gateways usually control access 
      to the DMZ from the trusted or the untrusted network.
    </defn>
  </glentry>

  <glentry>
    <term id="DMZ">DMZ</term>
    <defn>
      See <xref refid="DMZlong"/>.
    </defn>
  </glentry>

  <glentry>
    <term id="pattern-matching">pattern-matching character</term>
    <defn>
      A special character such as an asterisk (*) or a question mark 
      (?) that can be used to represent zero or more characters. 
      Any character or set of characters can replace a pattern-matching 
      character.
    </defn>
  </glentry>

  <glentry>
    <term id="servlet">servlet</term>
    <defn>
      An application program, written in the Java programming language, 
      that is executed on a web server. A reference to a servlet 
      appears in the markup for a web page, in the same way that a 
      reference to a graphics file appears. The web server executes
      the servlet and sends the results of the execution (if there are
      any) to the web browser. Contrast with <xref refid="applet" />.
    </defn>
  </glentry>

  <glentry>
    <term id="wildcard-char">wildcard character</term>
    <defn>
      See <xref refid="pattern-matching"/>.
    </defn>
  </glentry>
</glossary>

In this XML listing, each <term> element has an id attribute that identifies it uniquely. Many <xref> elements also refer to other terms in the listing. Notice that each time we refer to another term, we don't use the actual text of the referenced term. When we write our stylesheet, we'll use the XPath id function to retrieve the text of the referenced term; if the name of a term changes (as buzzwords go in and out of fashion, some marketing genius might want to rename the "pattern-matching character," for example), we can rerun our stylesheet and be confident that all references to the new term contain the correct text.

Finally, some <term> elements have an xreftext element because some of the actual terms are longer than we'd like to use in a cross-reference. When we have an <xref> to the term ASCII (American Standard Code for Information Interchange), it would get pretty tedious if the entire text of the term appeared throughout our document. For this term, we'll use the xreftext attribute's value, ensuring that the cross-reference contains the less-intimidating text ASCII.

5.1.3. A Stylesheet That Uses the id() Function

Let's look at our desired output. What we want is an HTML document, such as that shown in Figure 5-1, that displays the various definitions in an easy-to-read format, with the cross-references formatted as hyperlinks.

In the HTML document, we'll need to address several things in our stylesheet:

Figure 5-1

Figure 5-1. HTML document with generated cross-references

Here's the template that takes care of our first task, generating the HTML <title> and the <h1>:

<xsl:template match="glossary">
  <html>
    <head>
      <title>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="glentry[1]/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="glentry[last()]/term"/>
      </title>
    </head>
    <body>
      <h1>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="glentry[1]/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="glentry[last()]/term"/>
      </h1>
      <xsl:apply-templates select="glentry"/>
    </body>
  </html>
</xsl:template>

We generate the <title> and <h1> using the XPath expressions glentry[1]/term for the first <term> in the document, and using glentry[last()]/term for the last term.

Our next step is to process all the <glentry> elements. We'll generate an HTML paragraph for each one, and then we'll generate a named anchor point, using the id attribute as the name of the anchor. Here's the template:

<xsl:template match="glentry">
  <p>
    <b>
      <a name="{@id}"/>
      <xsl:value-of select="term"/>
      <xsl:text>: </xsl:text>
    </b>
    <xsl:apply-templates select="defn"/>
  </p>
</xsl:template>

In this template, we're using an attribute value template to generate the name attribute of the HTML <a> element. The XPath expression @id retrieves the id attribute of the <glentry> element we're currently processing. We use this attribute to generate a named anchor. We then write the term itself in bold and apply the template for the <defn> element. In our output document, each glossary entry contains a paragraph with the highlighted term and its definition.

The name attribute of this HTML <a> element is generated with an attribute value template. See Section 3.3, "Attribute Value Templates" for more information.

Our next step is to process the cross-reference. Here's the template for the <xref> element:

<xsl:template match="xref">
  <a href="#{@refid}">
    <xsl:choose>
      <xsl:when test="id(@refid)/@xreftext">
        <xsl:value-of select="id(@refid)/@xreftext"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="id(@refid)"/>
      </xsl:otherwise>
    </xsl:choose>
  </a>
</xsl:template>

We create the <a> element in two steps:

For the first step, we know that the href attribute must contain a hash mark (#) followed by the name of the anchor point. Because we generated all the named anchors from the id attributes of the various <glentry> elements, we know the name of the anchor point is the same as the id.

Now all that's left is for us to retrieve the text. This retrieval is the most complicated part of the process (relatively speaking, anyway). Remember that we want to use the xreftext attribute of the <term> element, if there is one, and use the text of the <term> element, otherwise. To implement an if-then-else statement, we use the <xsl:choose> element. In the previous sample, we used a test expression of id(@refid)/@xreftext to see if the xreftext attribute exists. (Remember, an empty node-set is considered false. If the attribute doesn't exist, the node-set will be empty and the <xsl:otherwise> element will be evaluated.) If the test is true, we use id(@refid)/@xreftext to retrieve the cross-reference text. The first part of the XPath expression (id(@refid)) returns the node that has an ID that matches the value @refid; the second part (@xreftext) retrieves the xreftext attribute of that node. We insert the text of the xreftext attribute inside the <a> element.

Finally, we handle any <seealso> elements. The difference here is that the refids attribute can reference any number of glossary terms, so we'll use the id() function differently. Here's the template for <seealso>:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:for-each select="id(@refids)">
    <a href="#{@id}">
      <xsl:choose>
        <xsl:when test="@xreftext">
          <xsl:value-of select="@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="."/>
        </xsl:otherwise>
      </xsl:choose>
    </a>
    <xsl:if test="not(position()=last())">
      <xsl:text>, </xsl:text>
    </xsl:if>
  </xsl:for-each>
  <xsl:text>. </xsl:text>
</xsl:template>

There are a couple of important differences here. First, we call the id() function in an <xsl:for-each> element. Calling the id() function with an attribute of type IDREFS returns a node-set; each node in the node-set is the match for one of the IDs in the attribute.

The second difference is that referencing the correctly named anchor is more difficult. When we processed the <xref> element, we knew that the correct anchor name was the value of the refid attribute. When processing <seealso>, the refids attribute doesn't do us any good because it may contain any number of IDs. All is not lost, however. What we did previously was use the id attribute of each node returned by the id() function -- a minor inconvenience, but another difference in processing an attribute of type IDREFS instead of IDREF.

The final difference is that we want to add commas after all items except the last. The <xsl:if> element shown previously does just this. If the position() of the current item is the last, we don't output the comma and space (defined here with the <xsl:text> element). We formatted all references here as a sentence; as an exercise, feel free to process the items in a more sophisticated way. For example, you could generate an HTML list from the IDREFS, or maybe format things differently if the refids attribute only contains a single ID.

We've done several useful things with the id() function. We've been able to use attributes of type ID to discover the links between related pieces of information, and we've converted the XML into HTML links, renderable in an ordinary household browser. If this is the only kind of linking and referencing you need to do, that's great. Unfortunately, there are times when we need to do more, and on those occasions, the id() function doesn't quite cut it. We'll mention the limitations of the id() function briefly, then we'll discuss XSLT functions that let us overcome them.

5.1.4. Limitations of IDs

To this point, we've been able to generate cross-references easily. There are some limitations of the ID datatype and the id() function, though:

To get around all of these limitations, XSLT defines the key() function. We'll discuss that function in the next section.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.