I want to get rid of all invalid characters; example hexadecimal value 0x1A from an XML file using sed.
What is the regex and the command line?
EDIT
Added Perl tag hoping to get more responses. I prefer a one-liner solution.
EDIT
These are the valid XML characters
x9 | xA | xD | [x20-xD7FF] | [xE000-xFFFD] | [x10000-x10FFFF]
Assuming UTF-8 XML documents:
If you want to encode the bad bytes instead,
You can call it a few different ways: