I have 2 files, a small one and a big one. The small file is a subset of the big one.
For instance:
Small file:
solar:1000
alexey:2000
Big File:
andrey:1001
solar:1000
alexander:1003
alexey:2000
I want to delete all the lines from Big.txt which are also present in Small.txt. In other words, I want to delete the lines in Big file which are common to the small File.
So, I wrote a Perl Script as shown below:
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = @ARGV;
open(BIG, "<$big") || die("Couldn't read from the file: $big\n");
my @contents = <BIG>;
close (BIG);
open(SMALL, "<$small") || die ("Couldn't read from the file: $small\n");
while(<SMALL>)
{
chomp $_;
@contents = grep !/^\Q$_/, @contents;
}
close(SMALL);
open(OUTPUT, ">>$output") || die ("Couldn't open the file: $output\n");
print OUTPUT @contents;
close(OUTPUT);
However, this Perl Script does not delete the lines in Big.txt which are common to Small.txt
In this script, I first open the big file stream and copy the entire contents into the array, @contents. Then, I iterate over each entry in the small file and check for its presence in the bigger file. I filter the line from Big File and save it back into the array.
I am not sure why this script does not work? Thanks
Your script does NOT work because grep uses
$_and takes over (for the duration ofgrep) the old value of your$_from the loop (e.g. the variable$_you use in the regex is NOT the variable used for storing the loop value in thewhileblock – they are named the same, but have different scopes).Use a named variable instead (as a rule, NEVER use
$_for any code longer than 1 line, precisely to avoid this type of bug):However, as Oleg pointed out, a more efficient solution is to read small file’s lines into a hash and then process the big file ONCE, checking hash contents (I also improved the style a bit – feel free to study and use in the future, using lexical filehandle variables, 3-arg form of open and IO error printing via
$!):