I have a file with 3 columns ->
A1 0 9 A1 4 14 A1 16 24 A1 25 54 A1 64 84 A1 74 84 A2 15 20 A2 19 50
I want to check if each line (value in col2 and 3) is present already or is in between the range of previous line, if col1 value is equal.
The desired output is ->
A1 0 14 A1 16 54 A1 64 84 A2 15 50
I have tried ->
@ARGV or die "No input file specified";
open $first, '<',$ARGV[0] or die "Unable to open input file: $!";
#open $second,'<', $ARGV[1] or die "Unable to open input file: $!";
$k=0;
while (<$first>)
{
if($k==0)
{
@cols = split /\s+/;
$p0=$cols[0];
$p1=$cols[1];
$p2=$cols[2];
$p3=$cols[2]+1;
}
else{
@new = split /\s+/;
if ($new[0] eq $p0){
if ($new[1]>$p3)
{
print join("\t", @new),"\n";
$p0=$new[0];
$p1=$new[1];
$p2=$new[2];
$p3=$new[2]+1;
}
elsif ($new[2]>=$p2)
{
print $p0,"\t",$p1,"\t",$new[2],"\n";
$p2=$new[2];
$p3=$new[2]+1;
}
else
{
$p5=1;
}
}
else
{
print join("\t", @new),"\n";
$p0=$new[0];
$p1=$new[1];
$p2=$new[2];
$p3=$new[2]+1;
}}
$k=1;
}
and output I am getting is ->
A1 0 14 A1 16 24 A1 16 54 A1 64 84 A1 64 84 A2 15 20 A2 22 50
I am not able to understand why I am getting this wrong output. Also if there is any way that I can erase(or overwrite) the last printed line, then it will be very easy.
First of all, it would be much more simple to help you if you
strictandwarnings, and declared all your variabled close to first use withmyThe reason your code fails is that you are printing data under too many conditions. For example you output
A1 16 24when you find it cannot be joined with the previous rangeA1 4 14without waiting for it to be extended by the subsequentA1 25 54(when you correctly extend the range and print it again).A1 64 84is output twice for the same reason: first because it cannot be merged withA1 25 54, and again because it has been “extended” withA1 74 84. FinallyA2 15 20is output straight away because it has a new first column, even though it is merged with the next line and output again.You need to output a range only when you have found that it cannot be extended again. That happens when
This code prints output only in those cases an appears to do what you need.
OUTPUT