I have to write some code to handle reading and validating XML documents that use a version attribute in their root element to declare a version number, like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Junk xmlns="urn:com:initech:tps"
xmlns:xsi="http://www3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:com:initech.tps:schemas/foo/Junk.xsd"
VersionAttribute="2.0">
There are a bunch of nested schemas, my code has an org.w3c.dom.ls.LsResourceResolver to figure out what schema to use, implementing this method:
LSInput resolveResource(String type,
String namespaceURI,
String publicId,
String systemId,
String baseURI)
Previous versions of the schema have embedded the schema version into the namespace, so I could use the namespaceURI and systemId to decide which schema to provide. Now the version number has been switched to an attribute in the root element, and my resolver doesn’t have access to that. How am I supposed to figure out the version of the XML document in the LsResourceResolver?
I had never had to deal with schema versions before this and had no idea what was involved. When the version was part of the namespace then I could throw all the schemas in together and let them get sorted out, but with the version in the root element and namespace shared across versions there is no getting around reading the version information from the XML before starting the SAX parsing.
I’m going to do something very similar to what Pangea suggested (gets +1 from me), but I can’t follow the advice exactly because the document is too big to read it all into memory, even once. By using STAX I can minimize the amount of work done to get the version from the file. See this DeveloperWorks article, “Screen XML documents efficiently with StAX”:
The code to get the version information will look like:
Then I can use the version information to figure out what resolver to use and what schema documents to set on the SaxFactory.