Book HomeXML in a NutshellSearch this book

Chapter 22. XPath Reference

Contents:

The XPath Data Model
Data Types
Location Paths
Predicates
XPath Functions

XPath is a non-XML syntax for expressions that identify particular nodes and groups of nodes in an XML document. It is used by both XPointer and XSLT, as well as by some native XML databases and query languages.

22.1. The XPath Data Model

XPath views each XML document as a tree of nodes. Each node has one of seven types:

Root
Each document has exactly one root node, which is the root of the tree. This node contains one comment node child for each comment outside the document element, one processing-instruction node child for each processing instruction outside the document element, and exactly one element node child for the document element. It does not contain any representation of the XML declaration, the document type declaration, or any whitespace that occurs before or after the root element. The root node has no parent node. The root node's value is the value of the document element.

Element
An element node represents an element. It has a name, a namespace URI, a parent node, and a list of child nodes, which may include other element nodes, comment nodes, processing-instruction nodes, and text nodes. An element node also has a list of attributes and a list of in-scope namespaces, none of which are considered to be children of the element. The value of an element node is the complete, parsed text between the element's start- and end-tags that remains after all tags, comments, and processing instructions are removed and all entity and character references are resolved.

Attribute
An attribute node represents an attribute. It has a name, a namespace URI, a value, and a parent element. However, although elements are parents of attributes, attributes are not children of their parent elements. The biological metaphor breaks down here. xmlns and xmlns:prefix attributes are not represented as attribute nodes. An attribute node's value is the normalized attribute value.

Text
Each text node represents the maximum possible contiguous run of text between tags, processing instructions, and comments. A text node has a parent node but does not have children. A text node's value is the text of the node.

Namespace
A namespace node represents a namespace in scope on an element. In general, each namespace declaration by an xmlns or xmlns:prefix attribute produces multiple namespace nodes in the document tree. Like attribute nodes, each namespace node has a parent element but is not the child of that parent. The name of a namespace node is the prefix. The value of a namespace node is the namespace URI.

Processing instruction
A processing-instruction node represents a processing instruction. It has a target, data, a parent node, and no children. The name of a processing-instruction node is its target. The value of a processing-instruction node is the data of the processing instruction, not including any initial whitespace.

Comment
A comment node represents a comment. It has a parent node and no children. The value of a comment is the string content of the comment, not including the <!-- and -->.

The XML declaration and the document type declaration are not included in XPath's view of an XML document. All entity references, character references, and CDATA sections are resolved before the XPath tree is built. The references themselves are not included as a separate part of the tree.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.