This is a follow-up from Perl regular expression to match an IP address. I wanted to show how to solve the problem correctly, but ran into an unexpected behaviour.
use 5.010;
use strictures;
use Data::Munge qw(list2re);
use Regexp::IPv6 qw($IPv6_re);
use Regexp::Common qw(net);
our $port_re = list2re 0..65535;
sub ip_port_from_netloc {
my ($sentence) = @_;
return $sentence =~ /
( # capture either
(?<= \[ )
$IPv6_re # IPv6 address without brackets
(?= \] )
| # or
$RE{net}{IPv4} # IPv4 address
)
: # colon sep. host from port
($port_re) # capture port
/msx;
}
my ($ip, $port);
($ip, $port) = ip_port_from_netloc 'The netloc is 216.108.225.236:60099';
say $ip;
($ip, $port) = ip_port_from_netloc 'The netloc is [fe80::226:5eff:fe1e:dfbe]:60099';
say $ip;
The second match fails. use re 'debugcolor' reveals that :($port_re) already matches :5 within the IPv6 address. This surprises me because I did not switch off greediness with a ?. I expected it to gobble up everything up to the ], only then match against the separating colon and what follows after.
Why does this happen, and what’s the remedy?
Greed would only come into play if one of your atoms has a choice in how much it can match (i.e. if you used
*,+,?or{n,m}). This is not a greediness issue.The problem is that the regex will only match an IPv6 address if it’s immediately followed by both “
]” and by “:“. That can’t possibly happen.You could use two different matches, or you could use something like the following:
Maybe this is a bit cleaner?