I’m using XML Serialization heavily in a web service (the contracts pass complex types as params). Recently I noticed that the .Net XML Serialization engine is escaping some of the well known 5 reserved characters that must be escaped when included within an element (<, >, &, ‘ and “). My first reaction was “good old .Net, always looking out for me”.
But then I started experimenting and noticed it is only escaping the <, > and &, and for some reason not the apostrophy and double quotes. For example if I return this literal string in a field within a complex type from my service:
Bad:<>&'":Data
This is what is transferred over the wire (as seen from Fiddler):
Bad:<>&'":Data
Has anyone run into this or understand why this is? Is the serializer simply overlooking them or is there a reason for this? As I understand it the ‘ and ” are not by spec valid within an xml element.
According to the XML spec, for regular content and markup:
&always needs to be escaped as&because it’s the escape character<always needs to be escaped as<since it determines the start of an element. It even has to be escaped within attributes as a safety and to make writing parser error detection simpler.>does not need to be escaped as>but often is for symmetry with<'needs to be escaped as'only if in an attribute delimited by'"needs to be escaped as"only if in an attribute delimited by"Inside of processing instructions, comments and
CDATAsections, the rules change some, but the details are in the 2.4 Character Data and Markup portion of the spec.Your serializer is trying to do you a favor by keeping the file somewhat human-readable.
(Each of the above may also be escaped using their numeric equivalents.)