I am new to Perl and have a question about the syntax. I received this code for parsing a file containing specific information. I was wondering what the if (/DID/) part of the subroutine get_number is doing? Is this leveraging regular expressions? I’m not quite sure because regular-expression matches look like $_ =~ /some expression/. Finally, is the while loop in the get_number subroutine necessary?
#!/usr/bin/env perl
use Scalar::Util qw/ looks_like_number /;
use WWW::Mechanize;
# store the name of all the OCR file names in an array
my @file_list=qw{
blah.txt
};
# set the scalar index to zero
my $file_index=0;
# open the file titled 'outputfile.txt' and write to it
# (or indicate that the file can't be opened)
open(OUT_FILE, '>', 'outputfile.txt')
or die "Can't open output file\n";
while($file_index < 1){
# open the OCR file and store it in the filehandle IN_FILE
open(IN_FILE, '<', "$file_list[$file_index]")
or die "Can't read source file!\n";
print "Processing file $file_list[$file_index]\n";
while(<IN_FILE>){
my $citing_pat=get_number();
get_country($citing_pat);
}
$file_index=$file_index+1;
}
close IN_FILE;
close OUT_FILE;
The definition of get_number is below.
sub get_number {
while(<IN_FILE>){
if(/DID/){
my @fields=split / /;
chomp($fields[3]);
if($fields[3] !~ /\D/){
return $fields[3];
}
}
}
}
In that case,
if (/DID/)by default searches the$_variable, so it is correct. However, it is a rather loose regex, IMO.The while loop in the sub may be necessary, it depends on what your input looks like. You should be aware that the two while loops will cause some lines to get completely skipped.
The while loop in the main program will take one line, and do nothing with it. Basically, this means that the first line in the file, and every line directly following a matching line (e.g. a line that contains “DID” and the 4th field is a number), will also be discarded.
In order to answer that question properly, we’d need to see the input file.
There are a number of issues with this code, and if it works as intended, it’s probably due to a healthy amount of luck.
Below is a cleaned up version of the code. I kept the modules in, since I do not know if they are used elsewhere. I also kept the output file, since it might be used somewhere you have not shown. This code will not attempt to use undefined values for
get_country, and will simply do nothing if it does not find a suitable number.