I’d like to merge a few xml files.
The destination xml is slightly different then the source files. The destination file contains an aditional root element.
For example.
The destination xml:
<?xml version="1.0" encoding="utf-8"?>
<customer ID="A0001" name="customername">
.....
.....
</customer>
Source xml:
<?xml version="1.0" encoding="utf-8"?>
<order number="00001">
<.....>
<.....>
<.....>
</order>
Every source xml file needs to be inserted between <customer ...> and </customer>
The source files can be very large (e.g. 2 Gb).
I can write the destination xml file with the root element and read the source files using XmlTextReader and
string myOrder = textReader.ReadOuterXml();
writer.WriteRaw(myOrder );
Result (where every order is a different xml file)
<?xml version="1.0" encoding="utf-8"?>
<customer ID="A0001" name="customername">
<order number="00001">
<.....>
<.....>
<.....>
</order>
<order number="00002">
<.....>
<.....>
<.....>
</order>
<order number="00003">
<.....>
<.....>
<.....>
</order>
</customer>
But i’m afraid of out of memory exeptions for the large files using ReadOuterXml().
Any suggestion ?
It sounds like in this particular case, assuming all the files are really using UTF-8, you can basically cheat. .NET 4 makes this particularly easy:
This isn’t quite as efficient as it might be, as it’ll open the output file three times – but it’s written about as simply as I could make it. Note that
lineshere is lazy – this won’t read the source files completely into memory; it’ll read a line at a time.It does rely on each file starting with the XML declaration and being in UTF-8 though. There are far more robust streaming approaches you could use, but if you’re confident of your source format, this is very simple…
EDIT: Sample usage: