A colleague of mine needs to develop an Eclipse plugin that has to parse multiple XML files to check for programming rules imposed by a client (for example, no xsl:for-each, or no namespaces declared but not used). There are about a 1000 files to be parsed regularly, each file containing about 300-400 lines.
We were wondering which solution was faster to do it. I’m thinking JDOM, and he’s thinking RegEx.
Anyone can help us decide which is best ?
Thanks
If all checks are simple “no ” or no namespace, a StAX parser would be best, as you are just streaming the documents through it, get all the start elements ‘events’ and then do your checking. For this, the parser needs relatively little memory.
If you need to referential checking, DOM may be better, as you can easily walk the tree (perhaps via xpath).