I have stumbled upon a really odd bug with PHP’s preg_replace function and some regex patterns. What I’m trying to do is replace custom tags delimited by brackets and convert them to HTML. The regex has to account for custom “fill” tags that will stay with the outputted HTML so that it can be replaced on-the-fly when the page loads (replacing with a site-name for instance).
Each regex pattern will work by itself, but for some reason, some of them will exit the function early if preceded by one of the other patterns is checked first. When I stumbled upon this, I used preg_match and a foreach loop to check the patterns before moving on and would return the result if found – so hypothetically it would seem fresh to each pattern.
This didn’t work either.
Check Code:
function replaceLTags($originalString){
$patterns = array(
'#^\[l\]([^\s]+)\[/l\]$#i' => '<a href="$1">$1</a>',
'#^\[l=([^\s]+)]([^\[]+)\[/l\]$#i'=> '<a href="$1">$2</a>',
'#^\[l=([^\s]+) title=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" title="$2">$3</a>',
'#^\[l=([^\s]+) rel=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" rel="$2">$3</a>',
'#^\[l=([^\s]+) onClick=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" onClick="$2">$3</a>',
'#^\[l=([^\s]+) style=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" style="$2">$3</a>',
'#^\[l=([^\s]+) onClick=([^\[]+) style=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" onClick="$2" style="$3">$4</a>',
'#^\[l=([^\s]+) class=([^\[]+) style=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" class="$2" style="$3">$4</a>',
'#^\[l=([^\s]+) class=([^\[]+) rel=([^\[]+)] target=([^\[]+)]([^\[]+)\[/l\]$#i' => '<a href="$1" class="$2" rel="$3" target="$4">$5</a>'
);
foreach ($patterns as $pattern => $replace){
if (preg_match($pattern, $originalString)){
return preg_replace($pattern, $replace, $originalString);
}
}
}
$string = '[l=[site_url]/site-category/ class=hello rel=nofollow target=_blank]Hello there[/l]';
echo $alteredString = $format->replaceLTags($string);
The above “String” would come out as:
<a href="[site_url">/site-category/ class=hello rel=nofollow target=_blank]Hello there</a>
When it should come out as:
<a href="[site_url]/site-category/" class="hello" rel="nofollow" target="_blank">Hello there</a>
But if moved that pattern further up in the list to be checked sooner, it’d format correctly.
I’m stumped, because it seems like the string is being overwritten somehow every time it’s checked even though that makes no sense.
Seems to me you’re doing a lot more work than you need to. Instead of using a separate regex/replacement for each possible list of attributes, why not use
preg_replace_callbackto process the attributes in a separate step? For example:See a complete demo here (updated; see comments).
Here’s an updated version of the code based on new information that was provided in the comments:
demo
In the first regex I changed
[site_url]to\[\w+\]so it can match any custom fill tag.Here’s a breakdown of the second regex:
The trickiest part is matching multi-word attribute values.
(?>\s+[^\s=]+)*will always consume the next tag name if there is one, but the lookahead forces it to backtrack. Normally it would only back off one character at a time, but the atomic group effectively forces it to backtrack by whole words or not at all.