I am currently validating a client’s HTML Source and I am getting a lot of validation errors for images and input files which do not have the Omittag. I would do it manually but this client literally has thousands of files, with a lot of instances where the is not .
This client has validated some img tags (for whatever reason).
Just wondering if there is a unix command I could run to check to see if the does not have a Omittag to add it.
I have done simple search and replaces with the following command:
find . \! -path '*.svn*' -type f -exec sed -i -n '1h;1!H;${;g;s/<b>/<strong>/g;p}' {} \;
But never something this large. Any help would be appreciated.
See questions I asked in comment at top.
Assuming you’re using GNU sed, and that you’re trying to add the trailing
/to your tags to make XML-compliant<img />and<input />, then replace the sed expression in your command with this one, and it should do the trick:'1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}'Here it is on a simple test file (SO’s colorizer doing wacky things):
Here’s GNU sed regex syntax and how the buffering works to do multiline search/replace.
Alternately you could use something like Tidy that’s designed for sanitizing bad HTML — that’s what I’d do if I were doing anything more complicated than a couple of simple search/replaces. Tidy’s options get complicated fast, so it’s usually better to write a script in your scripting language of choice (Python, Perl) that calls
libtidyand sets whatever options you need.