I have a script that reads two csv files and compares them to find out if an ID that appears in one also appears in the other. The error I am receiving is as follows:
Out of memory during “large” request for 67112960 bytes, total sbrk() is 348203008 bytes
And now for the code:
use strict;
use File::Basename;
my $DAT = $ARGV[0];
my $OPT = $ARGV[1];
my $beg_doc = $ARGV[2];
my $end_doc = $ARGV[3];
my $doc_counter = 0;
my $page_counter = 0;
my %opt_beg_docs;
my %beg_docs;
my ($fname, $dir, $suffix) = fileparse($DAT, qr/\.[^.]*/);
my $outfile = $dir . $fname . "._IMGLOG";
open(OPT, "<$OPT");
while(<OPT>){
my @OPT_Line = split(/,/, $_);
$beg_docs{@OPT_Line[0]} = "Y" if(@OPT_Line[3] eq "Y");
$opt_beg_docs{@OPT_Line[0]} = "Y";
}
close(OPT);
open(OUT, ">$outfile");
while((my $key, my $value) = each %opt_beg_docs){
print OUT "$key\n";
}
close(OUT);
open(DAT, "<$DAT");
readline(DAT); #skips header line
while(<DAT>){
$_ =~ s/\xFE//g;
my @DAT_Line = split(/\x14/, $_);
#gets the prefix and the range of the beg and end docs
(my $pre = @DAT_Line[$beg_doc]) =~ s/[0-9]//g;
(my $beg = @DAT_Line[$beg_doc]) =~ s/\D//g;
(my $end = @DAT_Line[$end_doc]) =~ s/\D//g;
#print OUT "BEGDOC: $beg ENDDOC: $end\n";
foreach($beg .. $end){
my $doc_id = $pre . $_;
if($opt_beg_docs{$doc_id} ne "Y"){
if($beg_docs{$doc_id} ne "Y"){
print OUT "$doc_id,DOCUMENT NOT FOUND IN OPT FILE\n";
$doc_counter++;
} else {
print OUT "$doc_id,PAGE NOT FOUND IN OPT FILE\n";
$page_counter++;
}
}
}
}
close(DAT);
close(OUT);
print "Found $page_counter missing pages and $doc_counter missing document(s)";
Basically I get all the ID’s from the file I am checking against to see if the ID exists in. Then I loop over the and generate the ID’s for the other file, because they are presented as a range. Then I take the generated ID and check for it in the hash of ID’s.
Also forgot to note I am using Windows
I’m not sure if it’s the cause of your error, but inside your loop where you’re reading
DAT, you probably want to replace this:with this:
and same for the other two lines there.