Here on SO people sometimes say something like “you cannot parse X with regular expressions, because X is not a regular language”. From my understanding however, modern regular expressions engines can match more than just regular languages in Chomsky’s sense. My questions:
given a regular expression engine that supports
- backreferences
- lookaround assertions of unlimited width
- recursion, like
(?R)
what kind of languages can it parse? Can it parse any context-free language, and if not, what would be the counterexample?
(To be precise, by “parse” I mean “build a single regular expression that would accept all strings generated by the grammar X and reject all other strings”).
Add.: I’m particularly interested to see an example of a context-free language that modern regex engines (Perl, Net, python regex module) would be unable to parse.
I recently wrote a rather long article on this topic: The true power of regular expressions.
To summarize:
a^n b^n).wwanda^n b^n c^n).Some examples:
Matching the context-free language
{a^n b^n, n>0}:Matching the context-sensitive language
{a^n b^n c^n, n>0}: