I’m trying to parse some of my first XML documents in SAX by simply implementing org.xml.sax.ContentHandler, and I don’t know if I’m understanding the flow. For a given XML document:
<?xml version="1.0"?>
<list>
<item>
<name>One</name>
<description>The number 1, expressed in letters.
</item>
<item>
<name>Two</name>
<description>The number 2, expressed in letters.
</item>
</list>
What would be the expected order of events in the parser? Am I right in assuming the following:
startDocument()
startElement() -> "list"
startElement() -> "item"
startElement() -> "name"
characters() (>=1 times) -> "One"
endElement() -> "name"
startElement() -> "description"
characters() (>=1 times) -> "The number 1, expressed in letters."
endElement() -> "description"
endElement() -> "item"
startElement() -> "item"
startElement() -> "name"
characters() (>=1 times) -> "Two"
endElement() -> "name"
startElement() -> "description"
characters() (>=1 times) -> "The number 2, expressed in letters."
endElement() -> "description"
endElement() -> "item"
endElement() -> "list"
endDocument()
Is that pretty much the gist of it?
Also, what’s the easiest way to parse? Presently, at each call to startElement, I’m saving as a private variable the name of the current element for when I’m parsing data in the characters call. Is there an easier/better way of doing it?
Yes, you’ve got the gist of it.
SAX is a very low-level interface so don’t expect it to be easy. In most SAX applications you will probably want to maintain a stack, where startElement pushes the element name onto the stack, and endElement pops it off. If you’re not handling mixed content, then characters() should probably append to a StringBuffer associated with the element on the top of the stack, and you should process the character content in the StringBuffer when the endElement event occurs. That’s because character content can be split into multiple calls on characters() any way the parser wants.