I’d like to extract all references to the document root from an XPath expression and inject a custom root after them.
I’m implementing a smallish part of validation (or rather fixing a bug) of an XML instance document created based on some schema language. This language provides means of specifying self contained chunks of XML. Each such chunk is defined within a separate file and specifies the XML element hierarchy. Each such hierarchy has one or more root elements belonging to the same document root, much like the invisible document root of any XML document.
These files are however not aware of the fact that, what they specify, is only a part of a larger system. This larger system is actually another XML document (having another document root) with a single top-level XML element, which contains all the root elements defined by any number of such schema language files.
Any node in the XML hierarchy may be constrained with an XPath expression which must evaluate to true in order for the element to be considered valid during validation. Herein lies the root of my problem. These XPath expressions may contain absolute location paths, which reference the document root of a single XML chunk and not the document root of the system. Consider the following XML instance:
<data xmlns="system:uri">
<root-one xmlns="root-one:uri">
<items>
<item>
<group>base</group>
<class>person</person>
<name>John Smith</name>
<description>valid entry</description>
</item>
<item>
<group>base</group>
<class>animal</person>
<name>Dog</name>
<description>invalid entry</description>
</item>
</items>
<item-classes>
<item-class>
<class>person</class>
<group>base</group>
</item-class>
</item-classes>
</root-one>
<root-two xmlns="root-two:uri">
<!-- obscured content -->
</root-two>
</data>
{system:uri}data represents the system, {root-one:uri}root-one and {root-two:uri}root-two are two chunks of XML, each defined within it’s own schema language file. Let’s say that each root-one/items/item instance must fulfill the following XPath condition, defined within the schema language file (don’t mind the current(), it works the same as the one in XSLT, referring to one of the item instances):
context: /root-one/items/item
assert: group=/root-one/item-classes/item-class[class=current()/class]/group
which should actually be
context: /data/root-one/items/item
assert: group=/data/root-one/item-classes/item-class[class=current()/class]/group
How do I get all references to the document root (/) in any XPath expression and inject them with the correct root? I have no control over how these expressions are formed, so they may come in any shape and size, as long as they satisfy XPath 1.0 syntax, but I have to make them evaluate properly.
I’m currently thinking of writing some sort of a tokenizer in java to handle this, but I would rather not go into it if there’s a simpler solution. The expressions are evaluated during a Schematron XSLT transformation within the context of the system document, so if I could somehow achieve path fixing using XSLT it would be perfect. I’m ready to accept any pointers that could lead me to a solution however.
Edit01
This is what an example file which contains the XPath expressions looks like (off the top of my head). I wish to transform the content of the @test attribute. Value of @context attribute is trivial to change since it always has a similar structure.
<?xml version="1.0" encoding="utf-8"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:tl="toplevel:uri"
xmlns:r1="root-one:uri"
xmlns:r2="root-two:uri">
<iso:ns prefix="tl" uri="toplevel:uri" />
<iso:ns prefix="r1" uri="root-one:uri" />
<iso:ns prefix="r2" uri="root-two:uri" />
<iso:pattern>
<iso:rule context="/r1:root-one/r1:items/r1:item">
<iso:assert test="r1:group=/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:group">The group of an item must match one of the predefined class groups.</iso:assert>
</iso:rule>
</iso:pattern>
</iso:schema>
Please note that the value of @test attribute can be any valid XPath 1.0 expression. I’d like to se a generic solution which can find any document root (‘/’) defined anywhere within the expression and inject it with a custom root element. The actual file may contain any number of iso:pattern elements, iso:rule elements, etc.
Edit02
For the example above the wanted result is the follwing iso:assert element:
<iso:assert test="r1:group=/tl:data/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:group">The group of an item must match one of the predefined class groups.</iso:assert>
Edit03
In response to How do you decide that /r1:root-one/ must be preceded by ‘/tl:data’ ? Could you, please describe the rules? – Dimitre Novatchev
/tl:data represents a root element of a document which is created by combining multiple other XML documents into a single one. The content of those documents in appended to this root element as children. r1:root-one becomes one of such children. The XPath constraints, which are a part of schema definition which describe what the element structure of r1:root-one looks like, are designed to work only in the context of this sub- XML document. When the sub- XML document gets appended to the “parent” document they lose meaning if absolute paths are present within the expression. So if the expression contains /r1:root-one this will have no meaning in the new document (there is no root-one root element within it, tl:data is the only root). I’d like to find all such cases (/r1:root-one/) and transform them (into /tl:data/r1:root-one/) so the expressions work in the context of the new document.
It is hard to specify the exact rules. Each occurrence of “/” which appears at the begining of a path (and therefore references the document root of a sub- XML document) should be replaced with “/tl:data/” so it now references the document root of the newly created document.
Edit04
As indicated in the text above, the solution should work for any XPath expression imaginable. Additional examples (imaginary elments from r1 namespace are made up – this sounded better inside my head):
<iso:assert test="r1:group=/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:group and r1:imaginary-element1=/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:imaginary-element1" />
<iso:assert test="r1:group=/r1:root-one/r1:item-classes/r1:item-class[r1:class=/r1:root-one/r1:imaginary-constants/r1:imaginary-constant]/r1:group" />
should become
<iso:assert test="r1:group=/tl:data/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:group and r1:imaginary-element1=/tl:data/r1:root-one/r1:item-classes/r1:item-class[r1:class=current()/r1:class]/r1:imaginary-element1" />
<iso:assert test="r1:group=/tl:data/r1:root-one/r1:item-classes/r1:item-class[r1:class=/tl:data/r1:root-one/r1:imaginary-constants/r1:imaginary-constant]/r1:group" />
Edit05
I now have the option to switch to an XSLT 2.0 processor. So I’ll accept XSLT 2.0 solutions.
In fact if someone could provide me with an XSLT regular expression which would match / sign which represents the document root within an XPath 1.0 expression, this would solve my problem (I would use the replace() function). I’ve been looking through XPath 1.0 grammar but have not come with anything useful yet.
After examining
XPath 1.0grammar/spec and switching toXSLT2.0in order to gainregexsupport, I’ve come up with the following monstrosity.It implements some rules that must be fulfilled in order for ‘/’ to represent the document root and not a path separator:
A couple of XPath tokens need special treatment when processing with regex and whitespace must also be taken into account. Characters ‘*’ and ‘-‘ might not represent operators at all. So far this worked for all my test cases but, since I’m relying on grammar instead of experience with XPath in order to form the expressions, I might have missed something.
These regex expressions demonstrate quite a few capabilities of XML Schema/XPath regex flavor. Multiple replace runs are required since some advanced features are not supported in it. The most notable being lookaround.
If someone gives me a better solution than this spaghetti XSLT I’ll gladly accept it. Note that this might not be the best and not even the only solution to my problem.
When this transformation is applied to
it results in this (paths fixed as expected)