For the Perl code below, I need to increase its efficiency since it’s taking

Question

0

Asked: June 11, 20262026-06-11T07:00:22+00:00 2026-06-11T07:00:22+00:00

For the Perl code below, I need to increase its efficiency since it’s taking

0

For the Perl code below, I need to increase its efficiency since it’s taking hours to process the input files (which contain millions of lines of data). Any ideas on how I can speed things up?

Given two files, I want to compare the data and print those lines that match and those that don’t. Please note that two columns need to be compared interchangeably.

For example,

input1.txt
A B
C D

input2.txt
B A
C D
E F
G H

Please note:
Lines 1 and 2 match (interchangeably); Lines 3 and 4 don’t match

Output:
B A match
C D match
E F don't match
G H don't match

Perl code:

#!/usr/bin/perl -w
use strict;
use warnings;

open INFH1, "<input1.txt" || die "Error\n";
open INFH2, "<input2.txt" || die "Error\n";
chomp (my @array=<INFH2>);

while (<INFH1>) 
{

  my @values = split;
  next if grep /\D/, @values or @values != 2;

  my $re = qr/\A$values[0]\s+$values[1]\z|\A$values[1]\s+$values[0]\z/;

    foreach my $temp (@array)
    {
    chomp $_;
    print "$_\n" if grep $_ =~ $re, $temp;                      
    }
}
close INFH1;
close INFH2;
1;

Any ideas on how to increase the efficiency of this code is highly appreciated. Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T07:00:24+00:00

If you have enough memory, use a hash. If symbols do not occur multiple times in input1.txt (i.e. if A B is in the file, A X is not), the following should work:

#!/usr/bin/perl
use warnings;
use strict;

my %hash;

open my $F1, '<', 'input1.txt' or die $!;
while (<$F1>) {
    my @values = split / /;
    @hash{@values} = reverse @values;
}
close $F1;

open my $F2, '<', 'input2.txt' or die $!;
while (<$F2>) {
    my @values = split / /;
    my $value = $hash{$values[0]};
    if ($value and $value eq $values[1]) {
        print "Matches: $_";
    } else {
        print "Does not match: $_";
    }
}
close $F2;

Update:

For repeated values, I would use a hash of hashes. Just sort the symbols, the first one will be the key in the large hash, the second one will be the key in the subhash:

#!/usr/bin/perl
use warnings;
use strict;

my %hash;

open my $IN1, '<', 'input1.txt' or die $!;
while (<$IN1>) {
    my @values = sort split;
    undef $hash{$values[0]}{$values[1]};
}
close $IN1;

open my $IN2, '<', 'input2.txt' or die $!;
while (<$IN2>) {
    chomp;
    my @values = sort split;
    if (exists $hash{$values[0]}{$values[1]}) {
        print "$_ matches\n";
    } else {
        print "$_ doesn't match\n";
    }
}
close $IN2;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

For the Perl code below, I need to increase its efficiency since it’s taking

Leave an answerCancel reply

1 Answer

Update:

Leave an answer
Cancel reply