I’m not a regex expert, but after a couple hours I’ve built this regex:
#\[url=(?!.*?<div onclick="unveil_spoiler.*?\[/url\])([^_\W]+?://.*?)\](.+?)\[/url\]#i
Which is a case-insensitive:
\[url=(?!.*?<div onclick="unveil_spoiler.*?\[/url\])([^_\W]+?://.*?)\](.+?)\[/url\]
To match [url=xxxx://yyyy]zzzz[/url] patterns, except if it contains a <div onclick="unveil_spoiler string between [url= and and [/url].
Now I’m trying to add a similar check, to don’t return a match if it contains a \[url.*?\] between the \[url= and \[/url\]. I’ve tried many ways but I can seem to find a 100% working one.
First I tried adding another negative lookahead very similar to the one already present in my regex, which works partially, but then it seems like the lookahead goes through until the end of the line – until the last \[/url\] – for each match, I wanted the lookahead to stop at the first \[/url\] as the capturing group does.
Here’s a string for debugging:
[url=http://www.match.com]Match[/url][url=http://www.nomatch.com<div onclick="unveil_spoiler"]No match[/url][url=http://www.match.com]Match[/url][url=http://www.nomatch.com]<div onclick="unveil_spoiler" No match[/url]
[url=http://www.nomatch.com]No <div onclick="unveil_spoiler"match[/url][url=http://www.match.com]Match[/url][url=http://www.nomatch.com]No <div onclick="unveil_spoiler" match[/url][url=http://www.match.com]Match[/url]
[url=http://www.match.com]Match[/url][url=http://www.match.com][b]Match[/b][/url][url=http://www.match.com]Match[/url][url=http://www.match.com]Match[/url]
[url=http://www.thisshouldntmatch.com[url=http://www.match.com]Match[/url]This shouldn't match[/url]
[url=http://www.thisshouldntmatch.com[url=http://www.thisshouldntmatch.com[url=http://www.match.com]Match[/url]]This shouldn't match[/url]This shouldn't match[/url]
[url=http://www.thisshouldntmatch.com[url=http://www.match.com]Match[/url]This shouldn't match[/url][url=http://www.match.com]Match[/url]
[url=http://www.thisshouldntmatch.com]This shouldn't match[url=http://www.match.com]Match[/url][url=http://www.match.com]Match[/url][/url]
[url=http://www.match.com]Match[/url][url=http://www.match.com]Match[/url][url=http://www.match.com]Match[/url][url=http://www.match.com]Match[/url]
With the regex posted in the beginning of the post, it’ll match the 2 matches in the first line perfectly. Now I wanted it to don’t return a match when there’s a \[url.*?\] inside of the match, I’ve tried this regex:
\[url=(?!.*?\[url.*?\].*?\[/url.*?\])(?!.*?<div onclick="unveil_spoiler.*?\[/url\])([^_\W]+?://.*?)\](.+?)\[/url\]
And this:
\[url=(?!.*?(?:<div onclick="unveil_spoiler|\[url.*?\]).*?\[/url\])([^_\W]+?://.*?)\](.+?)\[/url\]
Which won’t return matches when there’s a \[url.*?\] inside of the match, but then it also stops matching the first match of the first line (in the example string) which it should (and as the first regex does). That is, it’ll only match the last valid match of each line then.
I think it’s a problem with the lookahead which doesn’t stop at the first \[/url\], is there any way to make it lazy/fix it then?
Any help is appreciated.
Does this work?
\[url=[^\[<]*?\](?:(?!(\[url)|<).)*?\[\/url\]http://regexr.com?30mna