From this article,
/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.
Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.
I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.
About the regex part,
^1?$ part is for counting 1 as not prime
^(11+?)\1+$ is for matching not prime numbers starting from 4.
What I do not understand is why is the ? in the regex needed at all.
According to me /^1$|^(11+)\1+$/ should be just fine and actually
perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;'
gives me the same set of prime numbers.
Is there any flaw in my understanding of the regular expression? Why are the ?s needed?
Isn’t ? supposed to match zero or one occurrence of the expression preceding it?
The first
?is for matching the empty string (i.e. 0) as non-prime. If you don’t care whether the regexp matches 0, then it’s not necessary.The second
?is only for efficiency.+is normally “greedy”, which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The+?makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)In this particular regexp, the
(11+?)means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used(11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the?it would waste time testing a lot of “potential” divisors that can’t possibly work. It would still match non-prime numbers, just more slowly. (Also,$1would be the largest divisor instead of the smallest one.)