to Perl Masters in the world!
I have a file like this to parse and want to make……
starting from the first column, ID, exon information, start position, end position and direction. ID increases by 1 when it meets a number.
1 9239 712 8571 +
1 start_codon 712 714 +
1 stop_codon 8569 8571 +
2 3882 24137 24264 +
2 start_codon 24137 24139 +
3 3882 24322 24391 +
4 3882 24490 26064 +
4 stop_codon 26062 26064 +
5 4972 26704 26740 +
5 start_codon 26704 26706 +
6 4972 26814 27170 +
7 4972 27257 27978 +
7 stop_codon 27976 27978 +
8 10048 40161 41114 -
8 start_codon 41112 41114 -
8 stop_codon 40161 40163 -
9 272 43167 43629 -
9 stop_codon 43167 43169 -
10 272 43755 44059 -
10 start_codon 44057 44059 -
like this ….
1 9239 *712* *8571* +
1 start_codon 712 714 +
1 stop_codon 8569 8571 +
*X 9239 712 8571 +*
2 3882 *24137* 24264 +
2 start_codon 24137 24139 +
3 3882 24322 24391 +
4 3882 24490 *26064* +
4 stop_codon 26062 26064 +
*X 3882 24173 26064 +*
5 4972 *26704* 26740 +
5 start_codon 26704 26706 +
6 4972 26814 27170 +
7 4972 27257 *27978* +
7 stop_codon 27976 27978 +
*X 4972 26704 27978 +*
8 10048 *40161* *41114* -
8 start_codon 41112 41114 -
8 stop_codon 40161 40163 -
*X 10048 40161 41114 -*
9 272 *43167* 43629 -
9 stop_codon 43167 43169 -
10 272 43755 *44059* -
10 start_codon 44057 44059 -
*X 272 43167 44059 -*
each line begins with X has to be added but with my skill I cannot… 🙁
The thing is for every exon number in the second column ignoring the “start_codon” and “end_codon”, have to get the minimum numbered exon position and maximum numbered exon position between asterisks *.
This is my basic code to parse the data… but I guess, have to re-code from the scratch
(I do not have any idea how to insert the line ‘X’)
(Sorry I deleted the code as its not so good enough and may give a confusion…)
Perl Masters in the World, Could you please help me???
Thank you!!
AS TLP aked I put my code back. Its embarrassing code though
use strict;
if (@ARGV != 1) {
print "Invalid arguments\n";
print "Usage: perl min_max.pl [exon_output_file]\n";
exit(0);
}
my $FILENAME = $ARGV[0];
my $exonid = 0;
my $exon = "";
my $startpos = 0;
my $endpos = 0;
my $strand = "";
my $min_pos = 0;
my $max_pos = 0;
open (DATA, $FILENAME);
while (my $line = <DATA>) {
chomp $line;
if ($line ne "") {
if ($line =~ /^(.+)\t(.+)\t(.+)\t(.+)\t(.+)/) {
$exonid = $1;
$exon = $2;
$startpos = $3;
$endpos = $4;
$strand = $5;
}
if ($exon =~ /\d+/) {
print $exonid,"\t",$exon,"\t",$startpos,"\t",$endpos,"\t",$strand,"\n";
} else {
print $exonid,"\t",$exon,"\t",$startpos,"\t",$endpos,"\t",$strand,"\n";
}
}
}
close (DATA);
exit;
How can I compare the biggest value and the lowest value….
Basically what you do is go through the lines, skip the ones you don’t want (i.e. no number in col 2), remember min/max for each new line in the same set, and when the col 2 number changes you print and start over. With this solution, you also have to print the last set manually at the end.
This code uses the internal
DATAfile handle for demonstration data. Simply change<DATA>to<>to use on a target input file like so:perl script.pl inputfileOutput: