I am wondering whether XSLT makes it possible to sort an XML file if I don’t know the entire XML-schema.
For example I would like to sort the following XML file.
Sort /CATALOG/CD elements by /CATALOG/CD/TITLE
<CATALOG attrib1="value1">
<DVD2>
<TITLE>The Godfather2</TITLE>
</DVD2>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
<CD attrib4="value4">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>
<CATALOG>
<CD><TITLE>E</TITLE></CD>
<CD><TITLE>I</TITLE></CD>
<CD><TITLE>D</TITLE></CD>
</CATALOG>
</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD attrib2="value2">
<TITLE attrib3="value3">Greatest Hits</TITLE>
<ARTIST>Dolly Parton</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>RCA</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1982</YEAR>
</CD>
<DVD>
<TITLE>The Godfather1</TITLE>
</DVD>
</CATALOG>
The output should be:
<CATALOG attrib1="value1">
<CD attrib4="value4">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>
<CATALOG>
<CD><TITLE>E</TITLE></CD>
<CD><TITLE>I</TITLE></CD>
<CD><TITLE>D</TITLE></CD>
</CATALOG>
</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD attrib2="value2">
<TITLE attrib3="value3">Greatest Hits</TITLE>
<ARTIST>Dolly Parton</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>RCA</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1982</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
<DVD2>
<TITLE>The Godfather2</TITLE>
</DVD2>
<DVD>
<TITLE>The Godfather1</TITLE>
</DVD>
</CATALOG>
The following is one of the many tries I did:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<!--<CATALOG>-->
<xsl:for-each select="CATALOG/CD">
<xsl:sort select="TITLE" />
<xsl:copy-of select="."/>
</xsl:for-each>
<!--</CATALOG>-->
</xsl:template>
</xsl:stylesheet>
The problem is that, with this XSLT, XML parts outside the CD list are not displayed.
I could uncomment the two commented-out parts of code, but that’s exactly what I want to avoid.
In that case if any attributes are added to the CATALOG element, they would not be copied to output XML.
I don’t want to re-build the XML file: I just want to do a sort knowing exact information only about some part of the XML-schema.
This functionality is easy to implement for example using .NET (with XmlDocument and XmlNode objects), or Python’s lxmx library, but is it possible with XSLT?
Thanks!
Note: It is not easy to find a sample input XML which will avoid misunderstanding the question in all cases. But I will try to detail the problem as much as I can:
- only CD elements right under CATALOG should be sorted (for example CD elements under the Bob Dylan section should be left untouched)
- it is all the same whether elements other than CD (for example DVD and DVD2) are in the beginning or end of the list
- no elements, attributes, values, comments, so nothing should be missing from the output XML
- non-CD elements (for example DVD and DVD2) should not be sorted by the TITLE subelement
Keeping on the line of just modifying the identity transformation (which might not be really safe), I think that the following should be equivalent to @Tim’s answer.
NOTE I’m not promoting this technique at all, unless you understand what’s the general behavior of the identity transformation.
or, if you care about the other elements
DVDandDVD2, you can do: