Below is a PHP regex intended to match (multiline) strings inside PHP or JavaScript source code (from this post), but I suspect it’s got issues.
What is the literal Python (or else PERL) equivalent of this?
~'(\\.|[^'])*'|"(\\.|[^"])*"~s
- the s modifier means dot matches all characters, including newline; in Python that’s
re.compile(..., re.DOTALL) - I totally don’t get the intent of the leading
\\.? Does that reduce to.? Are double-backslashes need to escape it twice in PHP? -
allowing in every position a match of either
\\.or[^'](any non-quote character) seems total overkill to me, maybe explains why this person’s regex blows up. Does[^']group not already match everything that.with s modifier does, surely it should match newlines? -
for constructing two versions of the regex with single, and double, quotes in Python, can use this two-step approach
-
NB a simpler version of this regex can also be found in this list of PHP regex examples, under Programming: String.
The regex is mostly okay, except it doesn’t handle escaped quotes (i.e.,
\"and\'). That’s easy enough to fix:That’s a “generic” regex; in Python you would usually write it in the form of a raw string:
In PHP you have to escape the backslashes to get them past PHP’s string processing:
Most of the currently-popular languages have either a string type that requires less escaping, support for regex literals, or both. Here’s how your regex would look as a C# verbatim string:
But, formatting considerations aside, the regex itself should work in any Perl-derived flavor (and many other flavors as well).
p.s.: Notice how I added the
+quantifier to your character classes. Your intuition about matching one character at a time is correct; adding the+makes a huge difference in performance. But don’t let that fool you; when you’re dealing with regexes, intuition seems to wrong more often than not. :/