I have two (very large) text files. What is the fastest way – in

Question

0

Asked: May 11, 20262026-05-11T17:28:00+00:00 2026-05-11T17:28:00+00:00

I have two (very large) text files. What is the fastest way – in

0

I have two (very large) text files. What is the fastest way – in terms of run time – to create a third file containing all lines of file1 that do not appear in file2?

So if file1 contains:

Sally  
Joe  
Tom  
Suzie

And file2 contains:

Sally  
Suzie  
Harry  
Tom

Then the output file should contain:

Joe

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T17:28:00+00:00

Create a hashmap containing each line from file 2. Then for each line in file 1, if it is not in the hashmap then output it. This will be O(N), which is the best efficiency class you can achieve given that you have to read the input.

Perl implementation:

#!/usr/bin/env perl
use warnings;
use strict;
use Carp ();

my $file1 = 'file1.txt';
my $file2 = 'file2.txt';

my %map;
{
    open my $in, '<',$file2 or Carp::croak("Cant open $file2");
    while (<$in>) {
      $map{$_} = 1;
    }
    close($in) or Carp::carp("error closing $file2");
}
{
   open my $in,'<', $file1 or Carp::croak("Cant open $file1");
   while (<$in>) {
    if (!$map{$_}) {
      print $_;
    }
   }
   close $in or Carp::carp("error closing $file1");
}

If file 2 is so large that the hashmap doesn’t fit in memory, then we have a different problem at hand. A possible solution is then to use the above solution on chunks of file 2 (small enough to fit into memory), outputing the results to temporary files. Provided there are sufficient matches between file 1 and file 2, then total output should be of reasonable size. To calculate the final results, we perform an intersection of the lines in temporary files, i.e. for a line to be in the final results it must occur in every temporary file.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two (very large) text files. What is the fastest way – in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply