A web page contains lots of image elements:
<img src="myImage.gif" width="180" height="18" />
But they may not be very well-formed, for example, the width or height attribute may be missing. And it also may not be properly closed with /. The src attribute is always there.
I need a regular expression that wraps these with a hyperlink having href set to the src of the img.
<a href="myImage.gif" target="_blank"><img src="myImage.gif" width="180" height="18" /></a>
I can successfully locate the images using this regexp in this editor: http://gskinner.com/RegExr/:
<img src="([^<]*)"[^<]*>
But what is the next step?
A DOM-based method is best, but if that regex works (not easy to accomplish for general HTML input) to match the desired
<img>elements, with the value of thesrcattribute captured in\1, then just replace the whole match (captured in\0) with:In Java, the backreferences in replacement string will be
$0and$1; I’m not sure what language you’re using so adjust accordingly.In Java, though, something like this would work:
It wasn’t clear from your question what to do with any other attributes that the
<img>may have. The above replacement keeps them as they are. If you also want to rewrite them (i.e. you’re not just wrapping<img>in an<a>anymore), then perhaps you want to rewrite to this: