XSL patterns are the SQL Select of the XML world.

To retrieve the data you want from an XML file, you need to understand how to construct the necessary pattern.

XSL (eXtensible Stylesheet Language) has three broad functions relating to XML documents: transforming, formatting and addressing. Transforming can mean changing the XML to HTML, to another XML document, or perhaps to a totally different file format. Formatting means the capability to add, delete, or even reorder the content of your source document. Addressing lets us select certain parts of the XML document and ignore the rest. To retrieve the desired elements of an XML document, a query syntax called XSL Pattern Language was introduced. As XSL developed, the core query syntax became known as XPath. This article focuses on XSL patterns that are a simplified subset of XPath expressions.

So, what kind of situations might call for the use of XSL patterns? Suppose you are presented with a large XML document and you really need only part of the information provided. You can extract just the data you need using an XSL pattern. Or, if you have an XML document that is not in your preferred format, you can use an XSL pattern to filter, reorder, remove, and otherwise restructure the document into one that you can use.

XML documents appear as a hierarchy or tree view of nodes, similar to a file system directory. Just as you can navigate and retrieve files and directories using simple commands at the DOS prompt, you can accomplish the same tasks with XML using XSL patterns. A pattern describes a specific way to select a set of nodes in a document.

Here's a sample XML document:

<catalog>
  <book title="XSLT Programmer's Reference " isbn="1861003129" section="Computer">
    <author>Kay, Michael</author>
    <category>XML</category>
    <category>XSL</category>
    <price>45.99</price>
  </book>
  <book title="The Lecturer's Tale" isbn="0312203322" section="Fiction">
    <author>Hynes, James</author>
    <category>Horror</category>
    <category>Humor</category>
    <price>30</price>
  </book>
  <book title="The Elements of Style" isbn="020530902X" section="English">
    <author>Strunk, WIlliam</author>
    <author>White, E.B.</author>
    <category>Grammar</category>
    <price>6.95</price>
  </book>
</catalog>

Suppose you want to find all the Book elements within the Catalog element. The XSL Patterns query might look like this:

catalog/book

Pretty simple, right? This pattern will retrieve the nodes whose element is book and whose parent element is catalog. The syntax looks similar to a file path or URL, since pattern elements are separated by a slash. (Note: Unlike searching a hard disk directory, XSL patterns are case-sensitive and you must use a forward slash.) Also, the query simply identifies what is to be found, not how to find it. It is up to the XSL application to decide how to retrieve the requested nodes.

The XSL patterns object model is very simple: there is only one object and two methods. The nodeList object and the selectNodes and selectSingleNode methods can be used to query against the nodeList or node, respectively, to return nodes that the rich XML object model can process.

XSL patterns are also crucial in the selection of nodes in an XSL stylesheet. The two most common stylesheet uses are in the methods <xsl:value-of select='xslpattern'> and <xsl:template match='xslpattern'>. Both of these methods depend on XSL Patterns to select the desired XML.

Matching the XML

Now, let's look a little closer at how XSL Patterns work. For many of you, this is another case of a new technology that seems difficult at first glance, but in reality is quite simple. Once you get used to the terminology and syntax, you'll find that it's not so bad after all, and you can easily tap into the power of Patterns.

Match the root element (catalog) of this document:

/catalog

When this path operator appears at the start of the pattern, it indicates that child nodes should be selected from the root node.

Match the book elements that are children of the <catalog>:

/catalog/book

Match the author elements that are a child of <book>, which is a child of <catalog>:

/catalog/book/author

Match any elements that are a child of <catalog>:

/catalog/*

Match all <author> elements anywhere within the current document (Note: this can be CPU intensive on a large document):

//author

This path operator searches for the specified element at any depth in the document.

Match the current element or context node (that's one period):

.

Match the parent element (two periods):

..

Match all <category> elements that are children of the context node:

.//category

Match the title attribute that belongs to the <book> node of the context node:

book/@title

Match any attribute nodes within the <book> node of the context node:

book/@*

Additional Filtering

The XSL patterns we've seen so far have just selected nodes without specifying any predicates, the equivalent of a WHERE clause in SQL Select. Here's where you can further filter the results to return the XML you want:

Match the <book> node whose <category> node = “XML”:

catalog/book[category="XML"]

Match the <book> node whose <category> node = “XML” or whose <price> is greater than 10:

//book[category="XML" or price > 10]

Match the <category> nodes that = “XML” or “Humor”:

//category[.='XML' or .='Humor']

Match the first <book> node (Note: the XMLDOM is zero-based):

catalog/book[0]

Match the last<book> node of the current context:

./book[end()]

Match the first category in the node list of last <category> nodes in the document:

(//category[end()])[0]

Here we use parentheses to indicate the precedence of the operation. Other operators you'll find useful are listed in Table 1:

Within the node collection you can also specify what you want returned with the node(), attribute() and text() methods:

Match all non-attribute nodes within <book> elements:

//book/node()

Return all attribute nodes within <book> elements:

//book/attribute()

Return the text of the title attribute node of the current node:

./@title/text()

I've created a simple public web page where you can test the above patterns and also create your own: http://www.eps-software.com/code/xslpattern.htm (Figure 1). The best way to get familiar with patterns is to start building your own queries and analyzing the results. When you visit the web page, note that you will need to apply your XSL patterns against the root node of the XML document.

Figure 1 - Test your own XSL Patterns on our XSL Patterns Demo site!
Figure 1 - Test your own XSL Patterns on our XSL Patterns Demo site!

Conclusion

You should now have a good idea of how to specify XSL patterns to return parts of an XML document. Patterns are used to describe both match and select attributes in XSL stylesheets and in XPath queries (see Travis Vandersypen's article in the previous issue of Component Developer Magazine). In fact many XML technologies use XSL patterns for document addressing. With an understanding of how to use patterns to get the exact pieces of information from XML, you will have a head start on advanced querying and filtering operations.

Table 1: Additional operators.

OperatorEquivalentDescription
=$eq$Equal
!=$ne$Not equal
<$lt$Less than
>$gt$Greater than
<=$le$Less than or equal to
>=$ge$Greater than or equal to
$ieq$Case-insensitive equal to
$ilt$Case-insensitive less than
$igt$Case-insensitive greater than
$ile$Case-insensitive less than or equal to
$ige$Case-insensitive greater than or equal to
or||, $or$Logical or
and&&, $and$Logical and