I’m a beginner in Perl. Could someone help me on how to extract data correctly from the script below?
#####################################################################
#! /usr/bin/perl
$text = "Name: Anne Lorrence Name: Burkart Name: Claire Name: Dan" ;
$match = 0 ;
while ($text =~ /Name: \b(\S+)\s+(\S+)\b/g || /Name: \b(\S+)\b/g) {
++ $match ;
print "Match number $match is $1 $2\n" ;
}
######################################################################
I wanted my output be something like this:
Match number 1 is Anne MLorrence
Match number 2 is Burkart
Match number 3 is Claire
Match number 4 is Dan
but in fact, my script gives me this:
Match number 1 is Anne MLorrence
Match number 2 is Burkart Name
May I know what is going wrong?
It uses a non-greedy capture and a zero-width positive lookahead to delimit the fields.
The
|$)part is an alternate. An easier example to understand would be(ABC|DEF), which means “match either ‘ABC’ or ‘DEF'”. The$is simply the symbol for end-of-line.The zero-width positive lookahead is explained in the perlre docs, but I’ll try to summarize here. It’s part of a class of patterns called “Look-Around Assertions”, and the name is quite accurate. Imagine the regex engine “looking around” at the point in the string. The one employed here “looks ahead” in the string for a positive match. It’s called zero-width because it doesn’t consume any of the string in the pattern matching process.
So, the pattern
/Name: (.+?)(?= Name:|$)says:There are probably better ways of solving your task, but this is short and clear and gives you an insight into some lesser used parts of the regex language. Look-Arounds are extremely useful and well worth learning about.