This is driving me nuts!
-
I read a txt file into a string called $filestring.
sysopen(handle, $filepath, O_RDONLY) or die "WHAT?"; local $/ = undef; my $filestring = <handle>; -
I made a pattern variable called $regex which is generated dynamically, but takes on the format:
(a)|(b)|(c) -
I search the text for patterns separated by a space
while($filestring =~ m/($regex)\s($regex)/g){ print "Match: $1 $2\n"; #...more stuff }
Most of the matches are valid, but for some reason I get a match like the following every once and a while:
Match: and
whereas a normal match should have two outputs like the following:
Match: , and
Does anyone know what might be causing this?
EDIT: it appears that the NULL character is being matched in the pattern.
Each of the alternatives in your regexp is a separate capture group. The whole regexp looks like:
I’ve notated it with the capture group number for each piece of the regexp.
So if
$filestringisb a,$1will beb,$2will be the empty strying because nothing matched(a).To avoid this, you should use non-capturing groups for the alternatives: