I have an xml file in a flat structure. We do not control the format of this xml file, just have to deal with it. I’ve renamed the fields because they are highly domain specific and don’t really make any difference to the problem.
<attribute name='Title'>Book A</attribute> <attribute name='Code'>1</attribute> <attribute name='Author'> <value>James Berry</value> <value>John Smith</value> </attribute> <attribute name='Title'>Book B</attribute> <attribute name='Code'>2</attribute> <attribute name='Title'>Book C</attribute> <attribute name='Code'>3</attribute> <attribute name='Author'> <value>James Berry</value> </attribute>
Key things to note: the file is not particularly hierarchical. Books are delimited by an occurance of an attribute element with name=’Title’. But the name=’Author’ attribute node is optional.
Is there a simple xpath statement I can use to find the authors of book ‘n’? It is easy to identify the title of book ‘n’, but the authors value is optional. And you can’t just take the following author because in the case of book 2, this would give the author for book 3.
I have written a state machine to parse this as a series of elements, but I can’t help thinking there would have been a way to directly get the results that I want.
We want the ‘attribute’ element of @name ‘Author’ that is following an ‘attribute’ element of @name ‘Title’ with a value of ‘Book n’, without any other ‘attribute’ element of @name ‘Title’ between them (because if there are, then the author authored some other book).
Said differently, it means we want an author of which the first preceding title (the one it ‘belongs to’) is the one we’re looking for:
N=C => finds
<attribute name='Author'><value>James Berry</value></attribute>N=B => finds nothing
Using keys and/or grouping functions available in XSLT 2.0 would make this easier (and much faster if the file is big).
(SO code parser seems to think ‘//’ stands for ‘comments’ but in XPath it’s not!!! Sigh.)