What would this mean in an expression?
(?m:.*?)
or this
(?m:\s*)
I mean, it appears to be something to do with whitespace but I’m unsure.
ADDITIONAL DETAILS:
The full expression I’m looking at is:
\A((?m:\s*)((\/\*(?m:.*?)\*\/)|(\#\#\# (?m:.*?)\#\#\#)|(\/\/ .* \n?)+|(\# .* \n?)+))+
(?...)is a way of applying modifiers to the regular expression inside the parentheses.(?:...)allows you to treat the part between the parentheses as a group, without affecting the set of strings captured by the matching engine. But you can add option letters between the?and the:, in which case the part of the regular expression between the parentheses behaves as if you had included those option letters when creating the regular expression. That is,/(?m:...)/behaves the same as/.../m.The
m, in turn, enables “multiline” mode.CORRECTED:
Here’s where I got confused in the original answer, because this option has different meanings in different environments.
This question is tagged Ruby, in which “multiline mode” causes the dot character (
.) to match newlines, whereas normally that’s the one character it doesn’t match:So your first regular expression,
(?m:.*?)will match any number (including zero) of any characters (including newlines). Basically, it will match anything at all, including nothing.In the second regular expression,
(?m:\s*), the modifier has no effect at all because there are no dots in the contained expression to modify.Back to the first expression. As Ωmega says, the
?after the*means that it is a non-greedy match. If that were the whole expression, or if there were no captures, it wouldn’t matter. But when something follows that section and there are captures, you get different results. Without the?, the longest possible match wins:With the
?, you get the shortest one instead:Finally, about the above-mentioned
/mconfusion (though if you want to avoid becoming confused yourself, this might be a good place to stop reading):In Perl 5 (which is the source of most regular expression extensions beyond the basic syntax), the behavior triggered by
/min Ruby is instead triggered by the/soption (which Ruby doesn’t have, though if you put one on your regex it will silently ignore it). In Perl,/m, despite still being called “multiline mode”, has a completely different effect: it causes the^and$anchors to match at newlines within the string as well as at the beginning and end of the whole string respectively. But in Ruby, that behavior is the default, and there’s not even an option to change it.