I’m trying to read multiple files that have the same format and want to make some statistics based on regex.
i.e I want to count similar items that are within the []
NC_013618 NC_013633 ([T(nad6 trnE ,cob trnT ,)])
C_013481 NC_013479 ([T(trnP ,rrnS trnF trnV rrnL nad1 trnI ,)])
NC_013485 NC_003159 ([T(trnC ,trnY ,)])
NC_013554 NC_013254 ([T(trnR ,trnN ,)])
NC_013607 NC_013618 ([T(nad6 trnE ,cob trnT ,)])
the problem is that i’m not getting right values, below is my code:
use strict;
use warnings;
my %data;
@FILES = glob("../mitos-crex/*.out");
foreach my $file (@FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
}
my @t;
my $c = 0;
foreach my $line (keys %data) {
foreach my $l ($data{$line}) {
print $l."\n";
($t[$c]) = $l =~ m/(\[.*\])/;
$c++;
}
}
#the problem is here the counter is not giving the right value
print $c;
my %counts;
$counts{$_}++ for @t;
thanks in advance
First of all, always
use strictanduse warnings. This measure is vital for all programming, as it will quickly reveal simple problems that you may otherwise overlook or waste time on debugging. This is especially true and a simple courtesy if you are asking for others’ help with your programYou seem to have become confused between slurping an entire file into a single string, and into an array of lines. The way you have written it, each element
$data{file}is a single scalar value containing all of the file’s data, and then you try to iterate over it withforeach $l ($data{$line}) { ... }which executes just once and so only find the first[...]string in the fileOrdinarily I would say that you shouldn’t read in all of your file data in this way, as the problem is likely to have a better streamed solution, but I don’t know what else you want to use the captured data for, so my solution follows your own design
I think you need to slurp the data into a virtual array, instead of a scalar, and then iterate over that in your loops. You must leave
$/defined so that the file is read in lines, and build an anonymous array with[ <$fh> ]. Then you can iterate over the lines withforeach my $line (@{ $data{$file} }) { ... }