I would like to use a language that I am familiar with – Java, C#, Ruby, PHP, C/C++, although examples in any language or pseudocode are more than welcome.
What is the best way of splitting a large XML document into smaller sections that are still valid XML? For my purposes, I need to split them into roughly thirds or fourths, but for the sake of providing examples, splitting them into n components would be good.
Well of course you can always extract the top-level elements (whether this is the granularity you want is up to you). In C#, you’d use the XmlDocument class. For example, if your XML file looked something like this:
then you’d use code like this to extract all of the Pieces:
Once you’ve got the nodes, you can do something with them in your code, or you can transfer the entire text of the node to its own XML document and act on that as if it were an independent piece of XML (including saving it back to disk, etc).