I have a script that reads two csv files and compares them to find

Question

0

Asked: May 29, 20262026-05-29T10:20:52+00:00 2026-05-29T10:20:52+00:00

I have a script that reads two csv files and compares them to find

0

I have a script that reads two csv files and compares them to find out if an ID that appears in one also appears in the other. The error I am receiving is as follows:

Out of memory during “large” request for 67112960 bytes, total sbrk() is 348203008 bytes

And now for the code:

use strict;
use File::Basename;

my $DAT     = $ARGV[0];
my $OPT     = $ARGV[1];

my $beg_doc = $ARGV[2];
my $end_doc = $ARGV[3];

my $doc_counter  = 0;
my $page_counter = 0;
my %opt_beg_docs;
my %beg_docs;

my ($fname, $dir, $suffix) = fileparse($DAT, qr/\.[^.]*/);
my $outfile = $dir . $fname . "._IMGLOG";

open(OPT, "<$OPT");
    while(<OPT>){
        my @OPT_Line = split(/,/, $_);
        $beg_docs{@OPT_Line[0]} = "Y" if(@OPT_Line[3] eq "Y");
        $opt_beg_docs{@OPT_Line[0]} = "Y";
    }
close(OPT);
open(OUT, ">$outfile");
while((my $key, my $value) = each %opt_beg_docs){

    print OUT "$key\n";
}
close(OUT);

open(DAT, "<$DAT");

    readline(DAT); #skips header line
    while(<DAT>){

        $_ =~ s/\xFE//g;

        my @DAT_Line = split(/\x14/, $_);

        #gets the prefix and the range of the beg and end docs
        (my $pre = @DAT_Line[$beg_doc]) =~ s/[0-9]//g;
        (my $beg = @DAT_Line[$beg_doc]) =~ s/\D//g;
        (my $end = @DAT_Line[$end_doc]) =~ s/\D//g;

        #print OUT "BEGDOC: $beg ENDDOC: $end\n";

        foreach($beg .. $end){
            my $doc_id = $pre . $_;

            if($opt_beg_docs{$doc_id} ne "Y"){
                if($beg_docs{$doc_id} ne "Y"){
                    print OUT "$doc_id,DOCUMENT NOT FOUND IN OPT FILE\n";
                    $doc_counter++;
                } else {
                    print OUT "$doc_id,PAGE NOT FOUND IN OPT FILE\n";
                    $page_counter++;
                }
            }
        }
    }
close(DAT);
close(OUT);

print "Found $page_counter missing pages and $doc_counter missing document(s)";

Basically I get all the ID’s from the file I am checking against to see if the ID exists in. Then I loop over the and generate the ID’s for the other file, because they are presented as a range. Then I take the generated ID and check for it in the hash of ID’s.

Also forgot to note I am using Windows

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T10:20:52+00:00

Editorial Team

2026-05-29T10:20:52+00:00Added an answer on May 29, 2026 at 10:20 am

I’m not sure if it’s the cause of your error, but inside your loop where you’re reading DAT, you probably want to replace this:

        (my $pre = @DAT_Line[$beg_doc]) =~ s/[0-9]//g;

with this:

        (my $pre = $DAT_Line[$beg_doc]) =~ s/[0-9]//g;

and same for the other two lines there.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a script that reads two csv files and compares them to find

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply