When should we replace < > & " ' in XML to characters like < etc.
My understanding is that it’s just to make sure that if the content part of XML has > < the parser will not treat is start or end of a tag.
Also, if I have a XML like:
<hello>mor>ning<hello>
should this be replaced to either:
<hello>mor>ning<hello><hello>mor>ning<hello><hello>mor>ning<hello>
I don’t understand why replacing is needed. When exactly is it required and what exactly (tags or text) should be replaced?
<,>,&,"and'all have special meanings in XML (such as “start of entity” or “attribute value delimiter”).In order to have those characters appear as data (instead of for their special meaning) they can be represented by entities (
<for<and so on).Sometimes those special meanings are context sensitive (e.g. ” doesn’t mean “attribute delimiter” outside of a tag) and there are places where they can appear raw as data. Rather then worry about those exceptions, it is simplest to just always represent them as entities if you want to avoid their special meaning. Then the only gotcha is explicit CDATA sections where the special meaning doesn’t hold (and
&won’t start an entity).It shouldn’t be represented as any of those. Entities must be terminated with a semi-colon.
How you should represent it depends on which bit of your example of data and which is markup. You haven’t said, for example, if
<hello>is supposed to be data or the start tag for a hello element.