I’m trying to write the correct regular expression in PHP to parse a string (written by some user) to build a request. It can be as complex as :
name = 'benjo' and (surname = 'benny' or surname = 'bennie') or age = 4
Later I’ll parse the string to build mySQL queries. For now, I’m just trying to find the correct regular expression to parse this string into an array that could look like :
$result = array(
0 => name = 'benjo',
1 => and
2 => array(
0 => surname = 'benny',
1 => or,
2 => surname = 'bennie',
),
3 => age = 4
);
I’ve thought about using recursive functions, and my regular expression for now is :
"#\((([^()]+|(?R))*)\)|(ou[^()])|(et[^()])#",
which of course does not work.
I’ll be glad if someone could help, I’m getting kinda stuck here ! 🙂
Tks,
Romain
LET’S CHANGE THE CHALLENGE ! 🙂
OK, now let’s make it a bit more simple. What would it take with a regular expression and adding the constraint that we stay on “level one” !! No nested parenthesis, just one level, but still as many AND/ORs… Would that change anything in favor or REGEXPs ? (I really would like to avoid writing my mini parser although that sounds really interesting…
Theoretical regular expression is not powerful enough to do parentheses matching. Theoretical regular expression can only take care of left recursion/right recursion rules. Middle recursion rules is cannot be expressed with regular expression (e.g.
<exp> -> "(" <exp> ")").Regex in programming languages, however, implements features which allow regex to exceed the power of regular grammar. For example, backreference in regex allows one to write a regex which matches a non context-free languages. However, even with backreference, it’s still not possible to balance parentheses with regex.
As PCRE library supports recursive regex via subroutine call feature, it is technically possible to parse such an expression with regex. However, unless you can write the regex yourself, which means that you understand what you are doing and can modify the regex to suit your needs, you should just write your own parser. Otherwise, you will end up with an unmaintainable mess.
Demo on regex101
The regex above should be use with
preg_match_all, and placed between delimiter with x flag (free spacing mode):/.../x.For each match:
m_maincapturing group has content, put the content through another round of matching.m_keyandm_valuecapturing group.Explanation
The
(?(DEFINE)...)block allows you to define named capturing groups for use in subroutine calls separately from the main pattern.The rest of the pattern is based on this technique to match all
<token>s in<token> ( <logic> <token> )*with global matching operation.The last part of the regex, while can be written as
(?&token), is expanded to match the field name and value in the simple expressions.