I’m creating an atom XML packet as part of a REST Service request.
A problem happens though when the value for one of the tags in the XML contains the symbol for a registered trademark.
The XML is being sent as a “PUT” through WebRequest. When the problem character is in the XML, the complete XML package doesn’t make it to the server. The data packet gets truncated and I see the error “Unexpected EOF in start tag” reported on the server.
I do notice at the server that the first part of the request comes in (before being truncated) containing the problem character as “®”. I expected to just see “®”.
I thought that I only need to worry about these characters in XML:
Double Quote: ”
Single Quote: ‘
Less Than: <
Greater Than >
Ampersand: &
How can I escape or process my string so that I can send any character with no problem?
Xml can trick you in this way. It’s not that certain characters are invalid, but that a large swath of unicode is defined as valid, and anything outside of that is verbotten. The trick to getting this right without more complex logic is to use a CDATA section.