We have pretty big (~200mb) xml files from different sources that we want to transform into a common format.
For structural transformations (element names, nesting, etc) we decided to use XSLT (1.0). Because it has to be fast (we receive a lot of those files), we chose Apache Xalan as the engine. Structural transformations might be quite complex (not just <tag a> -> <tag b>), and are different for xml files from different sources.
However, we also need to transform values of the elements. Transformations can be rather complex (i.e., some require access to Google Maps API, others require access to our database, etc…), so we decided to use a simple Ruby-based DSL, which is a list of “xpath selector” => transformer entities, i.e.:
{"rss/channel/item" => {:class => 'ItemMutators', :method => :guess_location}
However, keeping elements transformations apart from value transformations seems rather like a hack. Are there any better solutions?
For example, with Java you can write extensions for xalan, and you can use them to transform the values.
Is there something similar but for ruby?
Thank you, guys!
All the responses were very valuable. I am currently thinking 🙂
You should be able to use XSLT extensions. A web search reveals that Xalan supports Java for doing extensions: http://xml.apache.org/xalan-j/extensions.html
Quote from the linked page:
Also, apparently someone has written a package in Ruby which can provide xslt extensions: http://greg.rubyfr.net/pub/packages/ruby-xslt/classes/XML/XSLT.html