I have a requirement of transforming a huge XML document into multiple HTML documents.
The XML is as follows:
<society>
<party_members>
<member id="1" first_name="" last_name="O'Brien">
<ministry_id>1</ministry_id>
<ministry_id>3</ministry_id>
</member>
<member id="2" first_name="Julia" last_name="">
<ministry_id>2</ministry_id>
</member>
<member id="3" first_name="Winston" last_name="Smith">
<ministry_id>1</ministry_id>
</member>
</party_members>
<ministries>
<ministry>
<id>1</id>
<short_title>Minitrue</short_title>
<long_title>Ministry of truth</long_title>
<concerns>News, entertainment,education and arts </concerns>
</ministry>
<ministry>
<id>2</id>
<short_title>Minipax</short_title>
<long_title>Ministry of Peace</long_title>
<concerns>War</concerns>
</ministry>
<ministry>
<id>3</id>
<short_title>Minilove</short_title>
<long_title>Ministry of Love</long_title>
<concerns>Dissidents</concerns>
</ministry>
</ministries>
</society>
Where potential number of party members can be quite large – millions, and number of ministries is small, around 300-400. For each of the party member there should be an output HTML with following content:
<html>
<body>
<h2>Party member: Winston Smith</h2>
<h3>Works in:</h3>
<div class="ministry">
<h4>Ministry of truth</h4> - Minitrue
<h5>Ministry of truth <i>concerns</i> itself with <i>News, entertainment,education and arts</i></h5>
</div>
</body>
</html>
The number of output documents should == number of party members.
I’m now struggling with XSLT, but can’t get it to work.
Please help me decide if XSLT is a good tool for this job, if it is, hint me as if how to implement it, what XSLT constructs should be used, etc.
Of course I could simply write mini transformation in a procedural language, but I’m looking for a ‘apply transformation template‘ approach, rather than procedural parsing and modification to be able to hand the template to other users for further modifications (CSS, formatting etc).
I’m using ruby + nokogiri(which is a set of bindings to libxslt), but it is possible to use any language.
If XSTL is a bad fit for this task, what other instruments can be used here, provided I must transform ~1M of users in several minutes with small memory consumption?
Additional benefit would be to be able to parallelize the processing.
Thank you.
In order to achieve this result (producing several html files) you definitely need XSLT 2.0. I suggest the usage of Saxon for that.
Here you have a sample XSL which produces what you need (creates a single html file for each member, all inside a “html” folder in your sytem’s root, and gives back a report of what it created). You’ll probably need to tweek it a little bit to fit your needs.
And here you have a sample output:
Regarding performance, several millions is a big amount of data. I guess that xsl will be enough for it, but I’m afraid you’ll need to give it a try before knowing for sure.
I hope this helps you!