I’m looking for the best method to parse various XML documents using a Java application. I’m currently doing this with SAX and a custom content handler and it works great – zippy and stable.
I’ve decided to explore the option having the same program, that currently recieves a single format XML document, receive two additional XML document formats, with various XML element changes. I was hoping to just swap out the ContentHandler with an appropriate one based on the first ‘startElement’ in the document… but, uh-duh, the ContentHandler is set and then the document is parsed!
... constructor ... { SAXParserFactory spf = SAXParserFactory.newInstance(); try { SAXParser sp = spf.newSAXParser(); parser = sp.getXMLReader(); parser.setErrorHandler(new MyErrorHandler()); } catch (Exception e) {} ... parse StringBuffer ... try { parser.setContentHandler(pP); parser.parse(new InputSource(new StringReader(xml.toString()))); return true; } catch (IOException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } ...
So, it doesn’t appear that I can do this in the way I initially thought I could.
That being said, am I looking at this entirely wrong? What is the best method to parse multiple, discrete XML documents with the same XML handling code? I tried to ask in a more general post earlier… but, I think I was being too vague. For speed and efficiency purposes I never really looked at DOM because these XML documents are fairly large and the system receives about 1200 every few minutes. It’s just a one way send of information
To make this question too long and add to my confusion; following is a mockup of some various XML documents that I would like to have a single SAX, StAX, or ?? parser cleanly deal with.
products.xml:
<products> <product> <id>1</id> <name>Foo</name> <product> <id>2</id> <name>bar</name> </product> </products>
stores.xml:
<stores> <store> <id>1</id> <name>S1A</name> <location>CA</location> </store> <store> <id>2</id> <name>A1S</name> <location>NY</location> </store> </stores>
managers.xml:
<managers> <manager> <id>1</id> <name>Fen</name> <store>1</store> </manager> <manager> <id>2</id> <name>Diz</name> <store>2</store> </manager> </managers>
As I understand it, the problem is that you don’t know what format the document is prior to parsing. You could use a delegate pattern. I’m assuming you’re not validating against a DTD/XSD/etcetera and that it is OK for the DefaultHandler to have state.