For long time, I was always thinking that parameters in Perl subs are passed by value. Now, I hit something that I don’t understand:
use strict;
use warnings;
use Data::Dumper;
sub p {
print STDERR "Before match: " . Data::Dumper->Dump([[@_]]) . "\n";
"1" =~ /1/;
print STDERR "After match: " . Data::Dumper->Dump([[@_]]) . "\n";
}
my $line = "jojo.tsv.bz2";
if ($line =~ /\.([a-z0-9]+)(?:\.(bz2|gz|7z|zip))?$/i) {
p($1, $2 || 'none');
p([$1, $2 || 'none']);
}
On first invocation of p(), and after executing of regexp match, values in @_ will become undefs. On the second invocation, everything is OK (values passed as array ref are not affected).
This was tested with Perl versions 5.8.8 (CentOS 5.6) and 5.12.3 (Fedora 14).
The question is – how this could happen, that regexp match destroys content of @_, which was built using $1, $2 etc (other values, if you add them, are not affected)?
The perlsub man page says:
So when you pass
$1to a subroutine, inside that subroutine$_[0]is an alias for$1, not a copy of$1. Therefore it gets modified by the regexp match in yourp.In general, the start of every Perl subroutine should look something like this:
…or this:
…or this:
And any capturing regexp should be used like this:
Without such disciplines, you are likely to be driven mad by subtle bugs.