Introduction
XSL, which stands for Extensible Stylesheet Language, is a family of languages used for transforming and rendering XML documents. XSL is primarily composed of two main components: XSLT (XSL Transformations) and XSL-FO (XSL Formatting Objects).
- XSLT (XSL Transformations):
- XSLT is a language used to transform XML documents into other formats, such as HTML, plain text, CSV, SQL, or another XML structure.
- It allows you to define rules and templates for how the input XML should be processed and transformed into the desired output format.
- XSLT uses a declarative approach, where you specify what should happen to the input data rather than writing procedural code.
- XSLT is often used for tasks like format conversion, generating reports, and extracting specific information from XML documents.
- XSL-FO (XSL Formatting Objects):
- XSL-FO is a language used for specifying the layout and formatting of XML content when it needs to be presented or printed, such as generating PDF documents.
- It allows you to define rules for pagination, fonts, page headers, footers, and other formatting aspects of a document.
- XSL-FO is especially useful for generating well-structured, print-ready documents from XML data.
XSL is essential for several reasons:
Data Transformation: XML is often used to represent structured data, and XSLT provides a way to transform this data into different formats, making it more accessible to various applications and systems. For example, you can convert XML data into XML for schema conversion, HTML for web display, JSON for API responses, or CSV for data export.
Data Extraction: XSLT allows you to extract specific information from an XML document by defining templates that match the desired elements or attributes. This is particularly useful when dealing with large XML datasets.
Document Presentation: XSL-FO enables the generation of professionally formatted documents, such as PDFs, from XML content. This is valuable for creating reports, invoices, manuals, and other documents with specific layout and styling requirements.
Separation of Content and Presentation: XSL helps maintain a clear separation between the content (XML data) and its presentation (XSLT and XSL-FO). This separation is crucial for reusability and maintainability, as changes to the presentation can be made independently of the underlying data.
Cross-Platform Compatibility: XML and XSLT are platform-agnostic and can be used on various operating systems and programming languages. This makes them suitable for interoperability in heterogeneous environments.
Standards Compliance: XSL follows W3C (World Wide Web Consortium) standards, ensuring that your XML processing and transformation tasks adhere to established best practices.
In short, XSL, comprising XSLT and XSL-FO, is useful for working with XML data, enabling data transformation, extraction, and document formatting. It helps bridge the gap between structured data and its presentation in different formats, making it a fundamental technology for a wide range of applications in web development, data integration, and document generation.
This lesson explains how to build an XSL to transform XML from one XML format to another.
Applying an XSL
An XSL is a text file containing the transformation rules (expressed as XML). The XSL is fed to a transformation engine (see below) along with the source XML. The transformation engine produces an output file. The flow is shown in the diagram below.
XSL in R
To transform XML documents with XSL in R, you will need to install and then load the package xslt.
To perform the transformation, load the source XML, load the XSL, apply the transform, and then save the result.
xmlFn <- "XML/TeamRosters.xml"
xslFn <- "XSL/Rosters2StatsXSL.xsl"
txnFn <- "XML/BruinsStats.xml"
# Step 1: load the source XML
xmlDoc <- read_xml(xmlFn)
# Step 2: load the XSL transformation
xslStyle <- read_xml(xslFn)
# Step 3: apply the XSL to the XML
transformedXML <- xml_xslt(xmlDoc, xslStyle)
# Step 4: write the resulting XML to a file
status <- write_xml(transformedXML, file = txnFn)
R / R Studio Programming Hint: Occasionally during XSL processing an error occurs and the XSLT processor goes into an infinite loop. To stop the XSLT, you need to restart R. In R Studio (Posit), click on Session/Restart R…
Building XSL Transformations
Example XML
The source XML used in the examples contains information about players on team rosters and might be imported into a team analytics system or to play in a fantasy sports league. We want to extract the player’s last name, their goals and assists. We want to exclude goalies. The result should be a XML that can be read into Excel (using its XML import wizard) or directly into a data frame, so it should have a simple two-level structure. A portion of the XML (TeamRosters.xml) is reproduced below:
<?xml version="1.0" encoding="UTF-8"?>
<rosters season="2021">
<team name="Boston Bruins" division="MassMutual East">
<date>2021-02-19</date>
<team-info>
<city>Boston</city>
...
</team-info>
<season-info>
<gp>15</gp>
<pts w="10" l="3" ot="2">22</pts>
</season-info>
<player num="63">
<firstname>Brad</firstname>
<lastname>Marchand</lastname>
<position>Forward</position>
<assistantcaptain/>
<points>
<goals>9</goals>
<assists>10</assists>
</points>
</player>
…
<player num="73">
<firstname>Charlie</firstname>
<lastname>McAvoy</lastname>
...
</player>
<player num="40">
<firstname>Tuuka</firstname>
<lastname>Rask</lastname>
<position>Goalie</position>
...
<stats>
<GA>23</GA>
<SV>0.906</SV>
</stats>
</player>
</team>
</rosters>
The resulting XML should look like this (sorted by player last name and excluding goalies):
<?xml version="1.0" encoding="UTF-8"?>
<roster>
<row>
<name number="37">Bergeron</name>
<goals>7</goals>
<assists>11</assists>
</row>
<row>
<name number="63">Marchand</name>
<goals>9</goals>
<assists>10</assists>
</row>
...
</roster>
Define Preamble for XSL
Since an XSL style sheet is an XML document, it begins with the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
The next element, <xsl:transform>
defines that the XML document is an XSLT style sheet document.
The element <xsl:stylesheet>
is synonymous with <xsl:transform>
.
The xmlns:xsl attribute defines a unique namespace and may point to any URL, but most commonly the one shown. The XSL fragment is standard, so simply copy this code.
Key Rule Operators
Key <xsl:operators>
:
<xsl:template>
<xsl:value-of>
<xsl:for-each>
<xsl:sort>
<xsl:if>
<xsl:choose>
<xsl:key>
<xsl:message>
<xsl:apply-template>
<xsl:import>
<xsl:output>
<xsl:text>
<xsl:attribute>
<xsl:template>
An XSL style sheet consists of one or more sets of rules called templates. Each <xsl:template>
element defines a section of the target file. The value of the match attribute is an XPath expression. The result document is built top to bottom. Any text that is not <xsl: …>
is part of the target document (output of XSLT).
Example XSL
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<roster>
Boston Bruins
</roster>
</xsl:template>
</xsl:stylesheet>
Result XML
<?xml version="1.0" encoding="UTF-8"?>
<roster>
Boston Bruins
</roster>
Full Syntax: <xsl:template match="<xpath-expr>">
<xsl:template
name = QName
match = Pattern
priority = number
mode = QName >
</xsl:template>
- name: Name of the element to which template is to be applied.
- match: XPath pattern which matches the element(s) on which template is to be applied.
- priority: Priority of a template in case of conflicting templates.
- mode: Allows element to be processed multiple times to produce a different result each time.
<xsl:for-each>
<xsl:for-each select="<xpath-expr>">
processes a set of XML elements that match the XPath expression. It loops (i.e., iterates) through each element in the matched set of elements one by one: it is an iterator. The context of the XPath expression is set by any parent <xsl:operation>
.
<xsl:for-each
select = XPath-Expression>
process each node
</xsl:for-each>
In the example fragment below, the XPath expression within “xsl:for-each” binds to each <player> element under <team> and the rule is applied to each of them in turn.
Example XSL
...
<xsl:template match="//team[@name='Boston Bruins']">
<roster>
<xsl:for-each select="player">
<row>
<cell>Brad Marchand</cell>
<cell>9</cell>
<cell>9</cell>
</row>
</xsl:for-each>
</roster>
</xsl:template>
...
The resulting XML would be as follows:
<?xml version="1.0" encoding="UTF-8"?>
<roster>
<row>
<cell>Brad Marchand</cell>
<cell>9</cell>
<cell>9</cell>
</row>
<row>
<cell>Brad Marchand</cell>
<cell>9</cell>
<cell>9</cell>
</row>
...
<xsl:value-of>
<xsl:value-of select="xpath-expr">
extracts the value of the selected node determined by the XPath expression, as text. The context of the XPath expression is set by any parent <xsl:operation>
. The value of a node or element is everything between the element’s opening and closing tags, e.g., the value of “goals” for <goals>9</goals>
is 9 and the value of “stats” for <stats><goals>9</goals></stats>
is “<goals>9</goals>”.
The full syntax of <xsl:value-of>
is:
<xsl:value-of
select = Expression
disable-output-escaping = "yes" | "no"/>
Example XSL
<xsl:template match="//team[@name='Boston Bruins']">
<roster>
<xsl:for-each select="player">
<row>
<name>
<xsl:value-of select="lastname" />
</name>
<goals>
<xsl:value-of select="points/goals" />
</goals>
<assists>
<xsl:value-of select="points/assists" />
</assists>
</row>
</xsl:for-each>
</roster>
</xsl:template>
...
Result XML
<?xml version="1.0" encoding="UTF-8"?>
<roster>
<row>
<name>Brad Marchand</name>
<goals>9</goals>
<assists>10</assists>
</row>
<row>
<name>Patrice Bergeron</name>
<goals>7</goals>
<assists>11</assists>
</row>
...
<xsl:if>
<xsl:if test="<xpath-condition>">
allows for conditional processing of nodes based on the result of an XPath conditional expression. The context of the XPath expression is set by any parent <xsl:operation>
.
<xsl:if
test = boolean-expression >
</xsl:if>
In the example below, the rule after “xsl:if” is only applied if the conditional expression, the the value of the player’s <position> in the source XML element does not have the value “Goalie”, evaluates to true. This means that only those players whose position is not “Goalie” would be emitted to the result XML.
...
<xsl:template match="//team[@name='Boston Bruins']">
<roster>
<xsl:for-each select="player">
<xsl:if test = "position != 'Goalie'">
<row>
<name>
<xsl:value-of select="lastname" />
</name>
<goals>
<xsl:value-of select="points/goals" />
</goals>
<assists>
<xsl:value-of select="points/assists" />
</assists>
</row>
</xsl:if>
</xsl:for-each>
</roster>
</xsl:template>
...
<xsl:choose>
<xsl:choose>
and <xsl:when test="xpath-condition">
allows for multiple conditions and “if-then-else” like syntax. It is like a “switch-case” statement in some programming languages. Each condition is part of an <xsl:when>
element. If none of the test conditions are true, then an optional <xsl:otherwise>
element can be specified. The context of the XPath operation is set by any parent <xsl:operation>
.
<xsl:choose>
<xsl:when test = "xpath-condition1">
...
</xsl:when>
<xsl:when test = "xpath-condition2">
...
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
<xsl:attribute>
<xsl:attribute name="text">
produces an attribute for an element in an output XML. The context of the XPath expression is set by any parent <xsl:operation>
. The attribute is added to the tag for which <xsl:attribute>
is a child.
<someElement>
<xsl:attribute
name = "someName" >
</xsl:attribute>
</someElement>
The illustration below shows how attributes to elements emerge.
<xsl:sort>
<xsl:sort name="text">
orders the output. <xsl:sort>
is always within <xsl:for-each>
or <xsl:apply-templates>
. The full syntax is:
<xsl:sort
select="expression"
lang="language-code"
data-type="text|number|qname"
order="ascending|descending"
case-order="upper-first|lower-first"/>
The fragment below demonstrates the use. Note that “xsl:sort” is a self-closing tag.
...
<xsl:template match="//team[@name='Boston Bruins']">
<roster>
<xsl:for-each select="player">
<xsl:sort select="lastname"/>
<xsl:if test = "position != 'Goalie'">
<row>
<playername>
<xsl:attribute name="number">
<xsl:value-of select="./@num" />
</xsl:attribute>
<xsl:value-of select="lastname" />
</playername>
<goals>
<xsl:value-of select="points/goals" />
</goals>
<assists>
<xsl:value-of select="points/assists" />
</assists>
</row>
</xsl:if>
</xsl:for-each>
</roster>
</xsl:template>
<xsl:output>
<xsl:output>
defines the output file format of the target file. The default is XML which means that XSLT will insert the standard XML preamble <?xml version="1.0"?>
. It must appear right after the “xsl:transform” or “xsl:stylesheet” element and before any “xsl:template” operator.
The full syntax is:
<xsl:output
method = "xml|html|text|name"
version = "string"
encoding = "string"
omit-xml-declaration = "yes|no"
standalone = "yes|no"
doctype-public = "string"
doctype-system = "string"
cdata-section-elements = "namelist"
indent = "yes|no"
media-type = "string"/>
The example XSL fragment below demonstrates its use:
<?xml version="1.0"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"
encoding="UTF-8"
indent="no"
omit-xml-declaration="yes"/>
<xsl:template match="//plants">
...
Omitting the declaration and suppressing indentation might be appropriate when producing a JSON or CSV file as output rather than XML.
<xsl:text>
<xsl:text>
inserts specific unparsed text into the output file that is ignored by XSLT. This allows any text to be sent to the output file. For example, to send special characters such as space or newline to the output file, use XML entities, e.g.,
for newline and 
for space. This can also be used to include an embedded DTD or a link to an external DTD into the output XML.
The full syntax is:
<xsl:text [xml:whitespace = "preserve"]>
any text here
</xsl:text>
<xsl:variable>
<xsl:variable name = "var" select = "xpath-expression">
creates a variable that can be accessed in XPath expressions. Use $varName to access a variable with the name varName.
The full syntax is:
<xsl:variable name = "variable"
select = "xpath-expression" />
An XSL fragment demonstrating its use is below. Note the use of an aggregation function in the XPath expression to get a single value for the value of the variable “numRecords”.
<?xml version="1.0"?>
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="numRecords"
select="count(records/record)"/>
<transactions>
<xsl:attribute name="numTxns">
<xsl:value-of select="$numRecords" />
</xsl:attribute>
...
Math Extensions to XSLT
To perform mathematical calculations in XSLT, use either XSL 2.0 or a math library.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:math="http://exslt.org/math"
extension-element-prefixes="math" >
<xsl:value-of select="(floor(math:random()*8) mod 8) + 1" />
<xsl:variable name="someRecord"
select="(floor(math:random() * $numRecords)
mod $numRecords) + 1"/>
<product><xsl:value-of
select="../record[$someRecord]/product"/></product>
...
Tutorial
The tutorial below provides an overview of the concepts in this lesson. The files and slide deck referenced in the tutorial are linked below and should be downloaded prior to watching the tutorial so that you can follow along.
Resources for Tutorial
Right-click (or open the context menu) and save the files rather than clicking on the links to open the files within your browser.
Summary
XSL is a set of rules expressed as XML elements used for transforming and formatting XML documents. XSL is essential for tasks like converting XML data into different formats, extracting specific information, creating data files (e.g., PDFs), and maintaining a clear separation between content and presentation. It follows W3C standards and is widely used for working with XML data in various applications.
Resources
None yet. Let us know if you have favorites.
