I was trying to understand how .{n} and ?<option>: works in Regexp on Ruby 1.9.3 environment. But couldn’t understand how the below code produce the output:
irb(main):001:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.{2}(?m:.)\Z/
=> ["fin\n", "fin\r\n", "find"]
irb(main):002:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.{1}(?m:.)\Z/
=> ["fin\n", "fi\n\n"]
irb(main):003:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.{1}(?m:.)\Z/
=> []
irb(main):010:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.(?m:.)\Z/
=> ["fin\n", "fi\n\n"]
irb(main):011:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.(m:.)\Z/
=> []
irb(main):012:0> %W{fin\n fi\n\n \n\n fin\r\n find}.grep /f.(?m:.)\z/
=> []
Can anyone help me to understand how the above code worked to generate the mentioned output in IRB terminal?
Thanks,
As per @Kevin last paragraph I tried below and found expected and desirable output :
irb(main):014:0> %W{fin fi\n\n \n\n fin\r\n find}.grep /f.(?m:.)\z/
=> ["fin"]
irb(main):015:0> %W{fin fi\n\n \n\n fin\r find}.grep /f.(?m:.)\z/
=> ["fin"]
irb(main):016:0> %W{fin fi\n \n\n fin\r\n find}.grep /f.(?m:.)\z/
=> ["fin", "fi\n"]
irb(main):017:0> %W{fin fi\n \n\n fr\n find}.grep /f.(?m:.)\z/
=> ["fin", "fi\n", "fr\n"]
irb(main):018:0>
Thank you very much @Kevin . You helped me to understand the whole concept!
{n}means “repeat the previous atomntimes”. In regular expressions, an atom is a self-contained unit. So a single character is an atom. So is a dot. A group is an atom as well (that contains other atoms), as is a character class. So.{n}means “matchncharacters” (because.means “match any character”).Note that
{n}is not like a backreference, in that it doesn’t have to match the same text on each repetition..{5}behaves exactly like......This construct is also more powerful. It can take two numbers, and it matches a repetition count for that whole range. So
.{3,5}means “match 3 to 5 characters”. And.{3,}means “match 3 or more characters”.?can be replaced with{0,1},*with{0,}, and+with{1,}if you so desired.?<option:isn’t actually a thing. It’s(?<option>:<pattern>), and this turns on all the flags listed in<option>for the duration of<pattern>. It’s like a group, except it doesn’t actually create a back reference. So the expression(?m:.)means “match one character as if the flagmwas turned on”. Given the behavior ofmas “match \n” as nhahtdh said in the comments, the expression.(?m:.).means “match any character besides newline, followed by any character, followed by any character besides newline”.This construct has two benefits. First, it allows you to only have a flag apply to part of a pattern, which can be occasionally useful. And second, if you wrap your entire pattern in this construct, then you have control over the flags that apply to your regular expression regardless of where the expression is used. This is useful when you’re providing the regex as a user and don’t have control over the source of the program.
Let’s take a look at the examples you gave:
Your pattern
/f.{2}(?m:.)\Z/means “match f, followed by 2 of any character (but newline), followed by any character, and anchor to the end of the string or just before a newline”.So in each of the 3 matches,
finmatches thef.{2}.(?m:.)matches\nin the first,\rin the second, anddin the third. And\Zmatches the end of the string in the first, just before a newline in the second, and the end of the string in the third.fi\n\ndoesn’t match because the first\nhere can’t be matched by the.from.{2}without themflag.Here
fimatchesf.{1}in both cases.(?m:.)matchesnand\n, and\Zmatches before the newline in both cases.fin\r\ndoesn’t match because\Zwill only match before the final newline in the string, not before a CRLF pair. Andfinddoesn’t match because there’s nothing to match thed.I think you have a copy & paste error here. This is identical to the previous pattern and matches as that does.
This is also identical to the previous pattern.
.and.{1}are the same thing. In fact,{1}can always be stripped from any regular expression without changing anything.You dropped the
?in this pattern, changing the meaning of(m:.). This no longer changes options. Now it’s just a capturing group that matches the patternm:., which of course doesn’t occur in your input.You changed
\Zto\zhere. The difference between those two is\Zmay match before a trailing newline, but\zmust only match the end of the string. Without being able to match before the trailing newline, none of your inputs here match. But, for example, if you hadfin(without the newline) orfi\n(without the second newline) it would work.