The application I work uses XML for save/restore purposes. Here’s an example snippet:
<?xml version='1.0' standalone='yes'?> <itemSet> <item handle='2' attribute1='30' attribute2='blah'></item> <item handle='5' attribute1='27' attribute2='blahblah'></item> </itemSet>
I want to be able to efficiently pre-process the XML which I read in from the configuration file. In particular, I want to extract the handle values from the example configuration above.
Ideally, I need a function/method to be able to pass in an opaque XML string, and return all of the handle values in a list. For the above example, a list containing 2 and 5 would be returned.
I know there’s a regular expression out there that will help, but is it the most efficient way of doing this? String manipulation can be costly, and there may be potentially 1000s of XML strings I would need to process in a configuration file.
You are looking for a stream oriented XML parser that reads each node in your XML one at a a time rather then loading the whole thing into memory.
One of the best known is the SAX – Simple API for XML
Here’s a good article describing why to use SAX and also specific of using SAX in C++.
You can think of SAX as a parser of XML that only loads the bare minimum into memory and so works well on very large XML documents. As compare to the Regex or DOM approach that will require you to load the entire document into memory.