I have a bunch of XML files I’m using for user interface and string translation in my project, each of which have the following structure:
<?xml version="1.0" encoding="UTF-8" ?>
<messages>
<message id="x">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
<message id="y">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
<message id="z">
<!-- Text node or arbitrary XHTML markup in here -->
</message>
...
</messages>
As part of my build process I’d like to “minify” these files into a single XML file, whereby each <message> tag and all of its children are embedded within a <messages> tag.
The current solution I have is using grep to rip out the XML prolog, opening messages tag and closing messages tag from each file, and concatenating the result to a new file, after concatenating the XML prolog and opening messages, then finally concatenating the closing messages tag. This solution is… rather messy and error prone.
So, how can I use any command-line XML tools to automate this process? Could I use something like xmlpatterns and/or XSL transforms?
Side question: how would I verify that each <message> tag has an ID attribute, and that all ID attribute values in the final document are unique? I know I can do the first part by means of a DTD, but is the second also in the realm of DTDs or would I need to do something else?
After some research and experimentation, I came up with the following solution:
First I created an XML with a list of all of the XML files I wanted to combine together:
Then I wrote an XSL transform that selected the
<message>tags from each file listed in the index file:I was using Qt in my project, and Qt happens to include a tool called xmlpatterns which can perform XSL transformations. So I was able to include the following command in my build process and have my XML files automatically “minified” on each build.