I am looking for a regular expression in PHP that can replace the href attribute of anchor tags and src attribute of IMG, style, scripts, etc when they are internal.
an example: if I am looking at the page http://www.mysite.com and on that page there is an image:
<img src="/images/picture.gif /> then I want to be able to change that into:
<img src="http://mysite.com/images/picture.gif />
The same thing for anchor tags: <a href="otherpage.php" >foo</a> should be changed to
<a href="http://mysite.com/otherpage.php" >foo</a>
Also, it should be able to work on other elements that have a src= or a href= attribute, and it should work on elements that have one or more other attributes as well (eg <img class="test" src="/images/picture.gif alt="some picture" />)
I tried something like
preg_replace("/src=[\"']([\/])(.*)?[\"'] /", "src='".$domain."/$2'", $htmldata);
but this did not work well. It took the src attribute, but it captured all the attributes after the src as well. Also, it didn’t capture strings that did not start with a / (e.g. src="image.png" )
Change greediness with U modifier and allow for zero leading slashes:
…and since you don’t need src|href or the leading slash as a backreference, match but omit them with ?:
Then, the image name becomes $1 instead of $2.
My use of the U modifier on the whole pattern is because, when ? is otherwise present as in ?: and I don’t need additional granularity, my eyes see it more clearly.
Although, as others have pointed out, doing this by regex likely isn’t the Best One True Answer… 🙂