I have a huge (100k+ lines, 5MB+) XML which acts as a database for my C++ Application. The structure of the XML is quite straight forward, for example, it has chunks of:
<foo>
<bar prop="true"/>
<baz>blah</baz>
</foo>
The nesting of tags is several levels deep and there are many items with multiple properties. What is a good way to find and replace chunks of this kind of a file? For example, assume that the above section is repeated a few dozen times and in each chunk the value of the tag <baz> is different. I’d like to make edits such as:
- Setting all the values contained in tag
<baz>to a given value. - Remove chunks containing certain values
- Etc.
So far, I’ve learnt of the following methods for accomplishing this:
-
Find/Replace: A no-brainer, trivial solution and also my last fall-back. This approach, IMHO is the most time consuming, error prone and painful method. The absolute last resort.
-
RegExes: Use regular expressions to match blocks of interest and edit them using replacement expressions. Kinda like this blog entry: http://blogs.msdn.com/b/vseditor/archive/2004/08/12/213770.aspx. But I feel this would be error prone and there could be a bunch of missed items if the regex is not exactly right the first time around.
-
Parser & Save: Whip up a quick program to parse the XML using Xerces or XML DOM Interfaces (or some other XML library), read the XML in, manipulate it as desired and save back to disk. Again, this approach is a slow process, but once its up and running, easy to make modifications and more flexible then RegExes.
Are there any better ways to deal with this?
(EDIT: Thanks for all the redo it to use a DB suggestions, I know its a huge mess but by “better ways to deal with this” I meant the “find/replace” part. )
If you don’t want to put the entire document in memory, I would read it using a SAX parser. As you read it, you append the transformed document to a second (or a temp) file. I think it could be pretty fast, and use only a little memory footprint.