<![CDATA[ and ]]> are not allowed inside a <![CDATA[ … ]]> block. That is understandable.
Now, I have to transmit user entered data inside a <![CDATA[ … ]]> block. And a malicious user might enter either <![CDATA[ or ]]> or both.
The question is: what is the preferred way to handle this situation?
- Strip
<![CDATA[and]]>? - Replace it with spaces?
- Smack the user with an error message?
- Or is there an official way of actually transmitting them?
I think you are thinking about CDATA sections in the wrong way – CDATA stands for “Character data” and the CDATA syntax is simply syntax for a block of data that shouldn’t be interpreted as markup. CDATA sections are useful for embedding xml documents inside another xml document, however when including character data (i.e. text) in a document it shouldn’t change the meaning of the data if it is enclosed in a CDATA section over simply being encoded as text data (possibly with certain characters escaped).
The short version of this is that your application shouldn’t care whether the data is encoded as CDATA or not. If the text you are encoding isn’t overly heavy with xml-like syntax then you may be better off simply escaping
&and<characters – something that your XML API will probably do for you anyway. For example the InnerText property of XmlNode will escape characters as required.If you still want to use CDATA tags (escaping a large xml fragment may overly inflate the size of the resulting document) then you only need to escape the code CDATA syntax fragement (
]]>), for example this can be done by simply replacing]]>with]]]]><![CDATA[>.