I have an XML file that contains some odd formatting, such as:
<?xml version=3D3D"1.0" encoding=3D3D"ISO-8859-1"?>
Notice the “3D3D”s. Plus, throughout the rest of the file, the lines break at around 78 characters and include more “3D”s, along the lines of
Supercalifragilis=
=3D
ticexpialidocious=
=3D
At first I thought it all might be related to the ISO-8859-1 encoding, but running the text through PHP’s mb_convert_encoding($xml, "UTF-8", "ISO-8859-1") didn’t seem to change any of that.
Anyone familiar with these particular odd characters and formatting? If so, can you recommend any quick way to clean it up for convert it, so I can cleanly parse the file with something like SimpleXML?
The oddities are QP (Quoted Printable) encoding: =xx stands for the character with hex code xx. For example, =3D stands for the equals sign “=”. In QP, a soft line break is “=” at the end of a line. So it seems that the data was QP encoded twice.
So hopefully quoted_printable_decode will help.