I am using preg_replace to add a link to keywords if they are found within a long HTML string. I don’t want to add a link if the keyword is found within h1 tags or strong tags.
The below regex nearly works and basically says (I think): If the keyword is not immediately wrapped by either a h1 tag or a strong tag then replace with the keyword that was matched, as a bolded link to google.
$result = preg_replace('%(?!<h1>)(?!<strong>)\b(bobs widgets)\b(?!<\/strong>)(?!<\/h1>)%i','<a href="http://www.google.com"><strong>$1</strong></a>', $result, -1);
(the reason I don’t want to match if in strong tags is because I am recursing through a lot of keywords so don’t want to link an already linked keyword on subsequent passes)
the above works fine and won’t match:
<h1>bobs widgets</h1>
It will however match the keyword in the following text, because the h1 tag isn’t immediately either side of the keyword:
<h1>Here are bobs widgets for sale</h1>
I need to make the spaces either side optional and have tried adding \s* but that doesn’t get me anywhere. I’d be very grateful for a push in the right direction here.
Regular expressions are the wrong tool for this job. This has been discussed many times on Stack Overflow (such as the most famous thread on the site).
What you need is an HTML parser, such as the Simple HTML DOM Parser. Do yourself a favour and use something like this from the start. Imagine what’s going to happen when you run into an
<h1>where someone has added an attribute, or perhaps someone has improperly closed the tags, so you have a mixed up order on a</strong>and a</h1>. Getting things like that to work with a regular expression is not worth the trouble, and sometimes isn’t even possible.