The file format my application uses is Xml based. I just got a customer who has a botched xml file. The thing contains nearly 90,000 lines and for some reason there are about 20 “=” symbols randomly interspersed.
I get an XmlException for most of them with a line number and char position which allows me to find offending chars and remove them manually. I’ve just started writing a small app that automates this process, but I was wondering if there are better ways to repair damaged xml files.
Example of botched line:
<item name="InstanceGuid" typ=e_name="gh_guid" type_code="9">ee330f9f-a1e2-451a-8c6d-723f066a6bd4</item>
↑ (this is supposed to be [type_name])
You could search for any equal sign that isn’t followed by a double quote. A regular expression (regex) would be pretty simple to write up.
Or you could just open the file in an advanced text editor and search by that same regex expression to find and replace/remove. Some text editors allow you to find/replace with regex, so you could search for any equal sign not followed by double quote and just remove it.
Of course, I’d keep a copy of the original since if you had equal signs in the inner XML then it might mess it up, etc.