I need to find a certain chunk from a group of HTML files and delete it from them all. The files are really hacked up HTML, so instead of parsing it with HtmlAgility pack as I was trying before, I would like to use a simple regex.
the section of html will always look like this:
<CENTER>some constant text <img src=image.jpg> more constant text: variable section of text</CENTER>
All of the above can be any combination of upper and lower case, and notice that it is img src=image.jpg and not img src=’image.jpg’… And there can be any number of white space characters in between the constant characters.
here are some examples:
<CENTER>This page has been visited <IMG SRC=http://place.com/image.gif ALT='alt text'>times since 10th July 2007 </CENTER>
or
<center>This page has been visited <IMG src='http://place.com/image.gif' Alt='Alt Text'> times since 1st October 2005</center>
What do you think would be a good way to match this pattern?
How much of that text is needed to uniquely identify the target? I would try this first: