We’re using DataContractSerializer to serialize our data to XML. Recently we found a bug with how the string "\r\n" gets saved and read back – it was turned into just "\n". Apparently, what causes this is using an XmlWriter with Indent = true set:
// public class Test { public string Line; }
var serializer = new DataContractSerializer(typeof(Test));
using (var fs = File.Open("C:/test.xml", FileMode.Create))
using (var wr = XmlWriter.Create(fs, new XmlWriterSettings() { Indent = true }))
serializer.WriteObject(wr, new Test() { Line = "\r\n" });
Test test;
using (var fs = File.Open("C:/test.xml", FileMode.Open))
test = (Test) serializer.ReadObject(fs);
The obvious fix is to stop indenting XML, and indeed removing the “XmlWriter.Create” line makes the Line value roundtrip correctly, whether it’s "\n", "\r\n" or anything else.
However, the way DataContractSerializer writes it still doesn’t seem to be entirely safe or perhaps even correct – for example, just reading the resulting file with XML Notepad and saving it again destroys both "\n" and "\r\n" values completely.
What is the correct approach here? Is using XML as a format for serializing binary data a flawed concept? Are we wrong to expect that tools like XML Notepad won’t break our data? Do we need to augment each and every string field that could contain such text with some special attribute, perhaps something to force CDATA?
Potentially you could use a CDATA, but I do agree with your summary that using XML for serialising binary data is just plain wrong. Can you communicate the data another way?