I’m using a Java SAX parser (an implementation of org.xml.sax.ext.DefaultHandler2) to parse an XML document, and this document has a DOCTYPE declaration pointing to a DTD. For some weird reasons, I need to know the case in which the “doctype” keyword itself has been written in the original document, so that I can output a document using exactly the same case.
This is, I need to be able to differentiate:
<!DOCTYPE thing SYSTEMID ...>
…from:
<!doctype thing SYSTEMID ...>
Is there a way to achieve that from the parser itself? (I mean, without resorting to reading the first n bytes of the stream before executing the parser)
Thanks
Finally, it seems there is no way in which a java XML parser will tell you about the original case of a DOCTYPE clause, as it will think it always is in upper-case (which is what the XML spec says, but might not be true if you try to use such XML parser to parse HTML5).
The way I solved this was to implement my own java.io.Reader, which allowed me to read the bytes of the file and determine the original case of the DOCTYPE clause, and then use this Reader at the XML SAX parser. Once the parsing is done, I will ask the reader object what was the case of that clause, and I will get the correct answer.
It is messy and ugly, but… seems to be the only real option.