Here is my string:
$str="<p>Some <a href="#">link</a> with <a href="http://whatever.html?bla">LINK2</a> and <a href="http://whatever.html?bla" target="_blank">LINK3</a> and</p> more html"
I would like to remove the links LINK1 and LINK2 using php to get:
"<p>Some <a href="#">link</a> with and and</p> more html"
Here is what I think is close to what I need:
$find = array("<a(.*)LINK1(.*)</a>", "<a(.*)LINK2(.*)</a>");
$replace = array("", "");
$result=preg_replace("$find","$replace",$str);
This isn’t working. I have searched for days and tried many other options but never managed to get this to work as expected. Also, I don’t really mind if LINK1 and 2 appear as soon as the a tags are removed.
You are very close to a working solution. The problem you are facing is that regular expressions per default try to match as much as possible. The pattern
<a(.*)LINK1(.*)</a>will in fact match the first<ato the last</a>if they haveLINK1inbetween. What you want is just to just get the nearest<a>tag.There are a few ways to do this, but I usually go for making the matching ungreedy. Then it will instead try to find the smallest possible matches. Two ways of doing this is to append a
?after the quantifier or using the ungreedy modifierU. I prefer the first one.Using
?:Using modifier:
Both should work equally well here. The entire source code will thus be as follows (using
?):And yeah, as noted in other comments you shouldn’t rely on regular expressions for manipulating HTML code (because it is really easy to construct valid HTML code that will go through the expression unnoticed). However, I believe it is perfectly ok if you trust the HTML code that you parse or that the result of this matching is not crucial for other important functions.