So my problem is virtually identical to this previous StackOverflow question, but I’m reasking the question because I don’t like the accepted answer.
I’ve got a file of concatenated XML documents:
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
...
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
I’d like to parse out each one.
As far as I can tell, I can’t use scala.xml.XML, since that depends on the one document per file/string model.
Is there a subclass of Parser I can use for parsing XML documents from an input source? Because then I could just do something like many1 xmldoc or some such.
Ok, I came up with an answer I’m more happy with.
Basically I try to parse the XML using a
SAXParser, just likescala.xml.XML.loaddoes, but watch forSAXParseExceptions that indicate that the parser encountered a<?xmlin the wrong place.Then, I grab whatever root element has been parsed already, rewind the input just enough, and restart the parse from there.