I have a pretty simple bit of javascript that attempt to parse the xml I’ve extracted from the metadata in a jpeg:
var xmlDoc;
try {
xmlDoc = $.parseXML(xmlString);
} catch(e) {
console.log(e);
}
Here is the exception that gets thrown:
Invalid XML: <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
<photoshop:Instructions>C1DDZVs9Sr+DG5R9eSc%9w</photoshop:Instructions>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:aux="http://ns.adobe.com/exif/1.0/aux/">
<aux:SerialNumber>1</aux:SerialNumber>
<aux:Lens>AF-S DX VR Zoom-Nikkor 18-200mm f/3.5-5.6G IF-ED [II]</aux:Lens>
<aux:LensID>1</aux:LensID>
<aux:ImageNumber>6651</aux:ImageNumber>
<aux:FlashCompensation>0/1</aux:FlashCompensation>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
There doesn’t seem to be anything wrong with that XML. In fact, if I cut and paste that xml in directly, no exception is thrown:
var xmlDoc;
try {
xmlDoc = $.parseXML('<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"> <photoshop:Instructions>C1DDZVs9Sr+DG5R9eSc%9w</photoshop:Instructions> </rdf:Description> <rdf:Description rdf:about="" xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"> <aux:SerialNumber>1</aux:SerialNumber> <aux:Lens>AF-S DX VR Zoom-Nikkor 18-200mm f/3.5-5.6G IF-ED [II]</aux:Lens> <aux:LensID>1</aux:LensID> <aux:ImageNumber>6651</aux:ImageNumber> <aux:FlashCompensation>0/1</aux:FlashCompensation> </rdf:Description> </rdf:RDF> </x:xmpmeta>' );
} catch(e) {
console.log("error parsing xml: " + e);
}
I can only assume that there must be some kind of unprintable special character in there somewhere that is causing the trouble. How can I test that assumption and fix it, or perhaps something else is wrong?
I’ve found the problem. As I suspected, there was some nasty unprintable character lurking at the end of the string.
I was able to remove it with this dirty bit of hacking:
If it’s not obvious, it simply trims away anything from the start and end of the string that isn’t the expected angle brackets of an xml document. The jQuery function “trim()” was not effective at removing the rogue character as that only does whitespace.
I don’t know what the character was, and I’m not particularly happy with my solution, but I’m too busy to spend more time on it. Sigh.