I need to match everything between ‘[~‘ and ‘~]‘ tags.
Tried to write a lot of regex patterns but couldn’t find correct one:
#\[~(.*)~]#– this returns everything between first occurrence of [~ and last occurrence of ~].#\[~([^~]*)~]#– this works fine if there are no ~ symbol inside tags.
I understand that (.*) captures everything and ([^~]*) captures everything until it finds ~ character but I cant make it to capture everything until it finds ~] pair (any byte excepting ~] pair is possible inside tags including single ~ character). In other words, I dont know how to make negation against the pair of characters.
This is possible example:
Simple [example~]: [~here I can face both, ‘~’ and ‘]’ characters~] or another
example [~~~~~~[ABC]~~~~~~].
After preg_match_all() against regex I expect resulting array like this:
array(2) {
[0]=>
string(44) "here I can face both, '~' and "]" characters"
[1]=>
string(14) "~~~~~[ABC]~~~~~"
}
Note: Input string may contain binary data (00-FF).
Just to mention (for certain people here), I’ve already checked out all related Q/A + hundreds of Google search results.
*is greedy, so it takes as much as it can. You can make it non-greedy (add a?) which should solve your issue.The following website has a good description and explains it in more detail: Repetition with Star and Plus.
preg_matchdeals with binary strings pretty well, the.matches any character which reads as byte if you’re in the standard mode (non-utf8) – as you are.Simplified example for explanation:
Matches first an empty string, then a, then aa and then aab does not match so the last match aa is taken and returned. As you can see the engine had first internally three valid matches: empty string, a and aa. The last one wins in greedy-mode.
Is at first position. Needs 0 or more a non-greedy. First position is zero or more a, so matches an empty string and returns. The first one wins in non-greedy-mode.
For UTF-8 strings, use the
umodifier (PCRE8):#.*#u–.matches any UTF-8 character (which can be one or more bytes).