I am using SAX to parse some large XML files and I want to ask the following: The XML files have a complex structure. Something like the following:
<library>
<books>
<book>
<title></title>
<img>
<name></name>
<url></url>
</img>
...
...
</book>
...
...
</books>
<categories>
<category id="abcd">
<locations>
<location>...</location>
</locations>
<url>...</url>
</category>
...
...
</categories>
<name>...</name>
<url>...</url>
</library>
The fact is that these files are over 50MB each and a lot of tags are repeated under different context, e.g. url under /books/book/img but also under /library and under /library/categories/category and so on.
My SAX parser uses a subclass of DefaultHandler in which I override teh startElement and the endElement methods (among others). But the problem is that these methods are huge in terms of lines of code due to the business logic of these XML files. I am using a lot of
if ("url".equalsIgnoreCase(qName)) {
// peek at stack and if book is on top
// ...
// else if category is on top
// ...
} else if (....) {
}
I was wondering whether there is a more proper / correct / elegant way to perform the xml parsing.
Thank you all
Not sure whether you’re asking 1) is there something else you can do besides checking the tag against a bunch of strings or 2) if there’s an alternative to a long if-then-else kind of statement.
The answer to 1 is not that I’ve found. Someone else may tackle that one.
The answer to 2 depends on your domain. One way I see is that if the point of this is to hydrate a bunch of objects from an XML file, then you can use a factory method.
So the first factory method has the long if then else statement that simply passes off the XML to the appropriate classes. Then each of your classes has a method like constructYourselfFromXmlString. This will improve your design because only the objects themselves know about the private data that is in an XML to hydrate them.
the reason this is hard is that, if you think about it, exporting an Object to XML and importing back in really violates encapsulation. Nothing to be done about it, just is. This at least makes things a little more encapsulated.
HTH