|
Download
FAQ History |
|
API
Search Feedback |
How XPath Works
The XPath specification is the foundation for a variety of specifications, including XSLT and linking/addressing specifications such as XPointer. So an understanding of XPath is fundamental to a lot of advanced XML usage. This section provides a thorough introduction to XPath in the context of XSLT so that you can refer to it as needed.
Note: In this tutorial, you won't actually use XPath until later, in the section, Transforming XML Data with XSLT. So, if you like, you can skip this section and go on ahead to the next section, Writing Out a DOM as an XML File. (When you get to the end of that section, there will be a note that refers you back here so that you don't forget!)
XPath Expressions
In general, an XPath expression specifies a pattern that selects a set of XML nodes. XSLT templates then use those patterns when applying transformations. (XPointer, on the other hand, adds mechanisms for defining a point or a range so that XPath expressions can be used for addressing.)
The nodes in an XPath expression refer to more than just elements. They also refer to text and attributes, among other things. In fact, the XPath specification defines an abstract document model that defines seven kinds of nodes:
Note: The root element of the XML data is modeled by an element node. The XPath root node contains the document's root element as well as other information relating to the document.
The XSLT/XPath Data Model
Like the Document Object Model, the XSLT/XPath data model consists of a tree containing a variety of nodes. Under any given element node, there are text nodes, attribute nodes, element nodes, comment nodes, and processing instruction nodes.
In this abstract model, syntactic distinctions disappear, and you are left with a normalized view of the data. In a text node, for example, it makes no difference whether the text was defined in a
CDATAsection or whether it included entity references. The text node will consist of normalized data, as it exists after all parsing is complete. So the text will contain a<character, whether or not an entity reference such as<or aCDATAsection was used to include it. (Similarly, the text will contain an&character, whether it was delivered using&or it was in aCDATAsection.)In this section, we'll deal mostly with element nodes and text nodes. For the other addressing mechanisms, see the
XPathspecification.Templates and Contexts
An XSLT template is a set of formatting instructions that apply to the nodes selected by an XPath expression. In a stylesheet, an XSLT template would look something like this:
The expression
//LISTselects the set ofLISTnodes from the input stream. Additional instructions within the template tell the system what to do with them.The set of nodes selected by such an expression defines the context in which other expressions in the template are evaluated. That context can be considered as the whole set--for example, when determining the number of the nodes it contains.
The context can also be considered as a single member of the set, as each member is processed one by one. For example, inside the
LIST-processing template, the expression@typerefers to thetypeattribute of the currentLISTnode. (Similarly, the expression@*refers to all the attributes for the currentLISTelement.)Basic XPath Addressing
An XML document is a tree-structured (hierarchical) collection of nodes. As with a hierarchical directory structure, it is useful to specify a path that points to a particular node in the hierarchy (hence the name of the specification: XPath). In fact, much of the notation of directory paths is carried over intact:
For example, In an Extensible HTML (XHTML) document (an XML document that looks like HTML but is well formed according to XML rules), the path
/h1/h2/would indicate anh2element under anh1. (Recall that in XML, element names are case-sensitive, so this kind of specification works much better in XHTML than it would in plain HTML, because HTML is case-insensitive.)In a pattern-matching specification such as XPath, the specification
/h1/h2selects allh2elements that lie under anh1element. To select a specifich2element, you use square brackets[]for indexing (like those used for arrays). The path/h1[4]/h2[5]would therefore select the fifthh2element under the fourthh1element.
Note: In XHTML, all element names are in lowercase. That is a fairly common convention for XML documents. However, uppercase names are easier to read in a tutorial like this one. So for the remainder of the XSLT tutorial, all XML element names will be in uppercase. (Attribute names, on the other hand, will remain in lowercase.)
A name specified in an XPath expression refers to an element. For example,
h1in/h1/h2refers to anh1element. To refer to an attribute, you prefix the attribute name with an@sign. For example,@typerefers to thetypeattribute of an element. Assuming that you have an XML document withLISTelements, for example, the expressionLIST/@typeselects thetypeattribute of theLISTelement.
Note: Because the expression does not begin with /, the reference specifies a
listnode relative to the current context--whatever position in the document that happens to be.
Basic XPath Expressions
The full range of XPath expressions takes advantage of the wildcards, operators, and functions that XPath defines. You'll learn more about those shortly. Here, we look at a couple of the most common XPath expressions simply to introduce them.
The expression
@type="unordered"specifies an attribute namedtypewhose value isunordered. As you know, an expression such asLIST/@typespecifies thetypeattribute of aLISTelement.You can combine those two notations to get something interesting! In XPath, the square-bracket notation (
[]) normally associated with indexing is extended to specify selection criteria. So the expressionLIST[@type="unordered"]selects allLISTelements whosetypevalue isunordered.Similar expressions exist for elements. Each element has an associated string-value, which is formed by concatenating all the text segments that lie under the element. (A more detailed explanation of how that process works is coming up in String-Value of an Element.)
Suppose you model what's going on in your organization using an XML structure that consists of
PROJECTelements andACTIVITYelements that have a text string with the project name, multiplePERSONelements to list the people involved and, optionally, aSTATUSelement that records the project status. Here are other examples that use the extended square-bracket notation:Combining Index Addresses
The XPath specification defines quite a few addressing mechanisms, and they can be combined in many different ways. As a result, XPath delivers a lot of expressive power for a relatively simple specification. This section illustrates other interesting combinations:
Note: Many more combinations of address operators are listed in section 2.5 of the XPath specification. This is arguably the most useful section of the spec for defining an XSLT transform.
Wildcards
By definition, an unqualified XPath expression selects a set of XML nodes that matches that specified pattern. For example,
/HEADmatches all top-levelHEADentries, whereas/HEAD[1]matches only the first. Table 7-1 lists the wildcards that can be used in XPath expressions to broaden the scope of the pattern matching.
In the project database example,
/*/PERSON[.="Fred"]matches anyPROJECTorACTIVITYelement that names Fred.Extended-Path Addressing
So far, all the patterns you've seen have specified an exact number of levels in the hierarchy. For example,
/HEADspecifies anyHEADelement at the first level in the hierarchy, whereas/*/*specifies any element at the second level in the hierarchy. To specify an indeterminate level in the hierarchy, use a double forward slash (//). For example, the XPath expression//PARAselects allparagraphelements in a document, wherever they may be found.The
//pattern can also be used within a path. So the expression/HEAD/LIST//PARAindicates all paragraph elements in a subtree that begins from/HEAD/LIST.XPath Data Types and Operators
XPath expressions yield either a set of nodes, a string, a Boolean (a true/false value), or a number. Table 7-2 lists the operators that can be used in an Xpath expression
Expressions can be grouped in parentheses, so you don't have to worry about operator precedence.
Note: Operator precedence is a term that answers the question, "If you specify
a + b * c, does that mean(a+b) * cora + (b*c)?" (The operator precedence is roughly the same as that shown in the table.)
String-Value of an Element
The string-value of an element is the concatenation of all descendent text nodes, no matter how deep. Consider this mixed-content XML data:
The string-value of the
<PARA>element isThis paragraph contains a bold word. In particular, note that<B>is a child of<PARA>and that the textboldis a child of<B>. The point is that all the text in all children of a node joins in the concatenation to form the string-value.Also, it is worth understanding that the text in the abstract data model defined by XPath is fully normalized. So whether the XML structure contains the entity reference
<or<in aCDATAsection, the element's string-value will contain the<character. Therefore, when generating HTML or XML with an XSLT stylesheet, you must convert occurrences of<to<or enclose them in aCDATAsection. Similarly, occurrences of&must be converted to&.XPath Functions
This section ends with an overview of the XPath functions. You can use XPath functions to select a collection of nodes in the same way that you would use an element specification such as those you have already seen. Other functions return a string, a number, or a Boolean value. For example, the expression
/PROJECT/text()gets the string-value ofPROJECTnodes.Many functions depend on the current context. In the preceding example, the context for each invocation of the
text()function is thePROJECTnode that is currently selected.There are many XPath functions--too many to describe in detail here. This section provides a brief listing that shows the available XPath functions, along with a summary of what they do.
Note: Skim the list of functions to get an idea of what's there. For more information, see section 4 of the
XPathspecification.
Node-Set Functions
Many XPath expressions select a set of nodes. In essence, they return a node-set. One function does that, too.
(Elements have an ID only when the document has a DTD, which specifies which attribute has the
IDtype.)Positional Functions
These functions return positionally based numeric values.
last(): Returns the index of the last element. For example,/HEAD[last()]selects the lastHEADelement.position(): Returns the index position. For example,/HEAD[position() <= 5]selects the first fiveHEADelements.count(...): Returns the count of elements. For example,/HEAD[count(HEAD)=0]selects allHEADelements that have no subheads.String Functions
These functions operate on or return strings.
concat(string,string, ...): Concatenates the string values.starts-with(string1,string2): Returns true ifstring1starts withstring2.contains(string1,string2): Returns true ifstring1containsstring2.substring-before(string1,string2): Returns the start ofstring1beforestring2occurs in it.substring-after(string1,string2): Returns the remainder ofstring1afterstring2occurs in it.substring(string,idx): Returns the substring from the index position to the end, where the index of the firstchar= 1.substring(string,idx,len): Returns the substring of the specified length from the index position.string-length(): Returns the size of the context node's string-value; the context node is the currently selected node--the node that was selected by an XPath expression in which a function such asstring-length()is applied.string-length(string): Returns the size of the specified string.normalize-space(): Returns the normalized string-value of the current node (no leading or trailing whitespace, and sequences of whitespace characters converted to a single space).normalize-space(string): Returns the normalized string-value of the specified string.translate(string1,string2,string3): Convertsstring1, replacing occurrences of characters instring2with the corresponding character fromstring3.
Note: XPath defines three ways to get the text of an element:
text(),string(object), and the string-value implied by an element name in an expression like this:/PROJECT[PERSON="Fred"].
Boolean Functions
These functions operate on or return Boolean values.
not(...): Negates the specified Boolean value.true(): Returns true.false(): Returns false.lang(string): Returns true if the language of the context node (specified byxml:Langattributes) is the same as (or a sublanguage of) the specified language; for example,Lang("en")is true for<PARA_xml:Lang="en">...</PARA>.Numeric Functions
These functions operate on or return numeric values.
Conversion Functions
These functions convert one data type to another.
string(...): Returns the string value of a number, Boolean, or node-set.boolean(...): Returns a Boolean value for a number, string, or node-set (a non-zero number, a nonempty node-set, and a nonempty string are all true).number(...): Returns the numeric value of a Boolean, string, or node-set (true is 1, false is 0, a string containing a number becomes that number, the string-value of a node-set is converted to a number).Namespace Functions
These functions let you determine the namespace characteristics of a node.
local-name(): Returns the name of the current node, minus the namespace prefix.local-name(...): Returns the name of the first node in the specified node set, minus the namespace prefix.namespace-uri(): Returns the namespace URI from the current node.namespace-uri(...): Returns the namespace URI from the first node in the specified node-set.name(): Returns the expanded name (URI plus local name) of the current node.name(...): Returns the expanded name (URI plus local name) of the first node in the specified node-set.Summary
XPath operators, functions, wildcards, and node-addressing mechanisms can be combined in wide variety of ways. The introduction you've had so far should give you a good head start at specifying the pattern you need for any particular purpose.
|
Download
FAQ History |
|
API
Search Feedback |
All of the material in The J2EE(TM) 1.4 Tutorial is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.