My Perl program is processing an XML file. Some entries may contain & symbols. And the parser breaks, saying: “Invalid name in entity”.
How can I process the file and encode &-s in all the incorrect entities?
So, it will be something like this:
<words>text1 & text2</words> --> <words>text1 & text2</words>
It’s tricky, non-trivial, and usually involves tradeoffs. When I encountered a similar problem, replacing
&characters followed by either an uppercase character or whitespace (/\&[A-Z ]/in regexp) with&(and the “trailing character”) solved most cases — and it’s usually good enough since you’re already going the extra mile by accepting not well-formed XML input.