I am receving an XML file as an input, whose size can vary from a few KBs to a lot more. I am getting this file over a network. I need to extract a small number of nodes as per my use, so most of the document is pretty useless for me. I have no memory preferences, I just need speed.
Considering all this, I concluded :
-
Not using DOM here (due to possible huge size of doc , no CRUD requirement, and source being network)
-
No SAX as I only need to get a small subset of data.
-
StaX can be a way to go, but I am not sure if it is the fastest way.
-
JAXB came up as another option – but what sort of parser does it use ? I read it uses Xerces by default (which is what type – push or pull ?), although I can configure it for use with Stax or Woodstock as per this link
I am reading a lot, still confused with so many options ! Any help would be appreciated.
Thanks !
Edit : I want to add one more question here : What is wrong in using JAXB here ?
Fastest solution is by far a StAX parser, specially as you only need a specific subset of the XML file and you can easily ignore whatever isn’t really necessary using StAX, while you would receive the event anyway if you were using a SAX parser.
But it’s also a little bit more complicated than using SAX or DOM. One of these days I had to write a StAX parser for the following XML:
Here’s how the final parser code looks like:
The code itself is in portuguese but it should be straightforward for you to understand what it is, here’s the repo on github.