I’m using the XMLStreamReader interface from javax.xml to parse an XML file. The file contains huge data amounts and single text nodes of several KB.
The validating and reading generally works very good, but I’m having trouble with text nodes that are larger than 15k characters. The problem occurs in this function
String foo = "";
if (xsr.getEventType() == XMLStreamConstants.CHARACTERS) {
foo = xsr.getText();
xsr.next(); // read next tag
}
return foo;
xsr being the stream reader. The text in the text node is 53’337 characters long in this particular case (but varies), however the xsr.getText() method only returns the first 15’537 of them. Of course I could loop over the function and concatenate the strings, but somehow I don’t think that’s the idea…
I did not find anything in the documentation or anywhere else about this. Is it intended behavior or can someone confirm/deny it? Am I using it the wrong way somehow?
Thanks
Actually, that is the idea 🙂
The parser is permitted to break up the event stream however it wishes, as long as it’s consistent with the original document. That means it can, and often will, break up your text data into multiple events. How and when it chooses to do so is an implementation detail internal to the parser, and is essentially unpredictable.
So yes, if you receive multiple sequential
CHARACTERSevents, you need to append them manually. This is the price you pay for a low-level API.